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Abstract 

In  this  thesis,  we  give  efficient  approximation  algorithms  for  two  classical  combinatorial  op¬ 
timization  problems:  multicommodity  flow  problems  and  shop  scheduling  problems.  The  algo¬ 
rithms  we  develop  for  these  problems  yield  solutions  that  are  not  necessarily  optimal,  but  come 
with  a  provable  performance  guarantee;  that  is,  we  can  guarantee  that  the  solution  found  is 
within  a  certain  percentage  of  the  optimal  solution.  This  type  of  algorithm  is  known  as  an 
approximation  algorithm.  Our  results  show  that  by  allowing  a  small  error  in  the  solution  of  a 
problem,  it  is  often  possible  to  gain  a  significant  reduction  in  the  running  time  of  an  algorithm 
for  that  problem. 

In  Chapter  2,  we  study  the  multicommodity  flow  problem.  The  multicommodity  flow  prob¬ 
lem  involves  simultaneously  shipping  several  different  commodities  from  their  respective  sources 
to  their  sinks  in  a  single  network  so  that  the  total  amount  of  flow  going  through  each  edge  is  no 
more  than  its  capacity.  Associated  with  each  commodity  is  a  demand,  which  is  the  amount  of 
that  commodity  that  we  wish  to  ship.  Given  a  multicommodity  flow  problem,  one  often  wants 
to  know  if  there  is  a  feasible  flow,  i.e.,  if  it  is  possible  to  And  a  flow  that  satisfies  the  demands  aind 
obeys  the  capacity  constraints.  More  generally,  we  might  wish  to  know  the  maximum  percent¬ 
age  2  such  that  at  least  z  percent  of  each  demand  can  be  shipped  without  violating  the  capacity 
constraints.  The  latter  problem  is  known  as  the  concurrent  flow  problem.  Our  algorithms  are 
approximation  algorithms  that  find  (-optimal  solutions  to  the  concurrent  flow  problem,  that  is, 
solutions  in  which  2  is  within  a  (1  —  c)  factor  of  the  minimum  possible  value.  In  particular,  we 
show  that  for  any  e  >  0,  an  (-optimal  solution  to  the  n-node,  m-edge,  fc-commodity  concurrent 
flow  problem  can  be  found  by  a  randomized  algorithm  in  0{(~^kmn\ogk\og^  n)  time  and  by  a 
deterministic  algorithm  in  0{(~^k^mn\ogk\og^  n)  time. 

Our  expected  running  time  is  the  same  (up  to  polylog  factors)  as  the  time  needed  to 
compute  k  maximum-flows,  thus  giving  the  surprising  result  that  approximately  computing 
a  l;-commodity  concurrent  flow  is  about  as  difficult  as  exactly  computing  k  single-commodity 
maximum  flows.  In  fact,  we  formally  prove  that  a  ib-commodity  concurrent  flow  problem  can  be 
approximately  solved  by  approximately  solving  0(lfclogA:logn)  minimum-cost  flow  problems. 

The  multicommodity  flow  problem  has  several  important  applications.  Many  classical  prob- 
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lems  in  Operations  Research  can  be  phrased  as  multicommodity  flow  problems,  including: 
telecommunications  problems,  import-export  problems,  freight  transport  and  scheduling,  net¬ 
work  design,  freight  assignment  in  the  less-that-truckload  trucking  industry,  traffic  planning, 
and  busing  students  to  schools.  Multicommodity  flow  can  also  be  used  to  And  good  separators 
for  graphs,  yielding  divide- and-conquer  algorithms  for  several  iVP-hard  graph  problems.  In 
particular,  the  results  in  this  thesis  can  be  used  to  give  the  fastest  polylogarithnoic  approxima¬ 
tions  to  several  problems  including:  VLSI  channel  routing,  minimum  cut  linear  arrangement, 
minimum  area  layout,  \/2-bifurcators  of  a  graph,  minimum  feedback-arc  set,  graph  embed¬ 
ding  problems,  chordalization  of  a  graph,  register  sufficiency,  'minimum  deletion  of  clauses  in 
a  2CNF  =  formula,  via  minimization,  and  the  edge-deletion  graph  bipartization  problems. 
These  problems  will  be  discussed  in  Chapter  3.  In  addition  we  will  show  how,  in  some  cases, 
our  algorithms  can  be  adapted  to  find  integral  solutions.  The  ability  to  do  so  is  significant, 
since  the  integral  multicommodity  flow  problem  is  likely  to  be  more  difficult  than  the  prob¬ 
lem  of  finding  an  optimal  flow  that  is  not  necessarily  integral:  the  former  problem  is  NP-hard 
whereas  the  latter  is  solvable  in  polynomial  time  via  linear  programming. 

Not  only  do  our  algorithms  have  provably  efficient  running  times,  but  they  perform  well  in 
practice.  In  Chapter  4  we  discuss  an  implementation  of  one  variant  of  the  algorithm  presented 
in  Chapter  2.  The  results,  while  preliminary,  are  rather  encouraging.  For  large  problems  our 
implementation  significantly  outperforms  a  good  simplex-based  bnear  programming  code.  In 
fact,  we  have  been  able  to  solve  problems  that  are  larger  than  those  that  can  be  solved  by  good 
simplex-based  codes.  In  particular,  we  are  able  to  solve  problems  in  which  there  are  a  large 
number  of  commodities. 

In  Chapter  5,  we  turn  to  the  problem  of  shop  scheduling.  We  give  the  first  polylogarithmic 
approximation  algorithms  for  the  job-shop  problem,  flow-shop  problem,  and  several  extensions. 
Our  algorithms  are  randomized  and  combine  techniques  from  two  seemingly  disparate  fields  of 
study:  vector-sum  theorems  and  packet  routing  algorithms. 

In  Chapter  6,  we  show  how  to  make  the  shop  scheduling  algorithms  deterministic.  Our 
algorithm  makes  use  of  some  recent  extensions  of  our  multicommodity  flow  techniques  and 
unifies  many  of  the  ideas  in  this  thesis,  since  it  is  necessary  to  find  an  approximately  optimal 
integral  solution  to  a  generalized  version  of  the  multicommodity  flow  problem.  The  algorithms 
we  use  are  closely  related  to  those  used  for  the  multicommodity  flow  problem. 

Keywords:  Multicommodity  Flow,  Scheduling,  Combinatorial  Optimization,  Network  Algo¬ 
rithms,  Approximation  Algorithms,  Randomized  Algorithms. 

Thesis  Supervisor:  David  B.  Shmoys 

Title:  Associate  Professor  of  Industrial  Engineering  and  Operations  Research,  Cornell  University 
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Introduction 


Given  a  particular  combinatorial  optimization  problem,  there  are  many  possible  approaches  to 
finding  a  solution.  Perhaps  the  most  common  strategy  is  to  rely  on  a  general  purpose  method, 
such  as  linear  programming,  that  solves  a  large  class  of  problems.  The  advantages  of  using  such 
a  method  are  clear  -  a  single  computer  program  that  implements  this  method  can  solve  many 
different  problems.  The  only  effort  involved  in  solving  a  new  problem  in  this  class  is  to  express 
it  in  the  proper  form.  However,  general  purpose  methods  have  their  disadvantages  as  weU.  By 
expressing  a  particular  problem  as  an  instance  of  a  more  general  problem,  one  may  ignore  some 
structure  that  makes  the  original  problem  easier  to  solve. 

In  this  thesis,  we  develop  algorithms  that  exploit  the  particular  combinatorial  structure  of 
the  problem  at  hand.  This  approach  leads  to  algorithms  that  are  faster  than  previously  known 
ones,  and  which  are  able  to  solve  larger  sized  instances. 

In  particular,  we  give  efficient  approximation  algorithms  for  two  classes  of  problems;  multi- 
commodity  flow  problems  and  shop  scheduling  problems.  These  are  basic  and  classical  problems 
in  combinatorial  optimization.  The  algorithms  we  develop  for  these  problems  yield  solutions 
that  are  not  necessarily  optimal,  but  come  with  a  provable  performance  guarantee;  that  is,  we 
can  guarantee  that  the  solution  found  is  within  a  certain  percentage  of  the  optimal  solution. 
This  type  of  algorithm  is  known  as  an  approximation  algorithm.  For  many  applications,  an 
optimal  solution  is  not  needed,  either  because  the  application  can  be  solved  just  as  easily  with 
an  approximate  solution,  or  because  the  input  data  itself  may  only  be  accurate  up  to  some 
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fixed  precision.  Our  results  show  that  by  allowing  a  small  error  in  the  solution  of  a  problem, 
it  is  often  possible  to  gain  a  significant  reduction  in  the  running  time  of  an  algorithm  for  that 
problem. 

The  first  problem  we  study  is  the  multicommodity  flow  problem.  The  multicommodity  flow 
problem  involves  simultaneously  shipping  several  different  commodities  from  their  respective 
sources  to  their  sinks  in  a  single  network  so  that  the  total  amount  of  flow  going  through  each 
edge  is  no  more  than  its  capacity.  Associated  with  each  commodity  is  a  demand,  which  is  the 
amount  of  that  commodity  that  we  wish  to  ship.  Given  a  multicommodity  flow  problem,  one 
often  wants  to  know  if  there  is  a  feasible  flow,  i.e.,  if  it  is  possible  to  find  a  flow  that  satisfies 
the  demands  and  obeys  the  capacity  constraints.  More  generally,  we  might  wish  to  know  the 
maximum  percentage  z  such  that  at  least  z  percent  of  each  demand  can  be  shipped  without 
violating  the  capacity  constraints.  The  latter  problem  is  known  as  the  concurrent  flow  problem, 
and  is  equivalent  to  the  problem  of  determining  the  minimum  factor  by  which  the  capacities 
can  be  multiplied  so  that  it  is  possible  to  ship  100%  of  each  demand.  For  our  algorithms,  it 
is  convenient  to  state  the  concurrent  flow  problem  in  a  different,  but  equivalent  form.  We  are 
given  a  network  and  a  set  of  commodities.  Let  the  congestion  of  an  edge  be  the  ratio  of  the 
total  flow  on  that  edge  to  its  capacity.  We  wish  to  find  a  way  to  route  each  commodity  so 
that  the  maximum  edge  congestion  is  minimized.  We  denote  the  value  of  the  maximum  edge 
congestion  by  A  and  the  minimum  possible  value  of  A  by  A*.  Our  algorithms  are  approximation 
algorithms  that  find  €-optimal  solutions,  that  is,  solutions  in  which  A  <  (1  +  c)A*. 

An  example  of  a  concurrent  flow  problem  appears  in  Figure  1.1.  The  input  consists  of  a 
network  and  a  specification  of  the  commodities.  Each  edge  is  labeled  with  its  capacity.  The  goal 
is  to  find  a  solution  that  sends  1  unit  of  flow  between  and  Vj,  1  unit  of  flow  between  and 
V4  and  2  units  of  flow  between  Vj  and  v^.  In  Figures  1.2  and  1.3,  we  give  two  solutions  to  the 
concurrent  flow  problem.  In  Figure  1.2,  it  is  easy  to  verify  that  the  demands  are  all  satisfied. 
Further,  the  maximum  edge  congestion  A  =  1,  because  on  all  edges  the  flow  is  less  than  or  equal 
to  the  capacity.  In  Figure  1.3,  the  maximum  edge  congestion  A  =  A,  because  on  all  edges  the 
flow  is  less  than  or  equal  to  one  half  the  capacity.  As  we  will  prove  in  Chapter  2,  the  solution 
in  Figure  1.3  is  actually  the  optimal  solution  to  this  problem.  Throughout  the  thesis,  we  use  n, 
m  and  k  to  denote  the  number  of  nodes,  edges  and  commodities,  we  assume  that  the  demands 
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Figure  1.1:  A  sample  problem 
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Figure  1.2:  A  suboptimal  solution 
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Figure  1.3:  An  optimal  solution 


and  the  capacities  are  integral,  and  use  D  and  U  to  denote  the  largest  demands  and  capacities, 
respectively.  For  the  example  in  Figure  1.1,  n  =  5,  m  =  6,  /:  =  3,  P  =  2  and  [/  s=  4. 

In  this  thesis,  we  describe  the  first  combinatorial  approximation  algorithms  for  the  concur¬ 
rent  flow  problem.  Given  any  positive  c,  the  algorithms  find  c-optimaJ  solutions.  The  running 
times  of  the  algorithms  depend  polynomially  on  €“*,  and  are  significantly  better  than  the  run¬ 
ning  times  of  previous  algorithms  when  e  is  a  constant.  In  other  words,  by  trading  a  small 
amount  of  accuracy,  we  are  able  to  obtain  large  improvements  in  the  time  needed  to  solve  a 
multicommodity  flow  problem.  As  an  e"ample  of  the  running  times  we  can  achieve,  we  state  one 
of  our  results.  We  define  the  simple  concurrent  flow  problem  to  be  a  concurrent  flow  problem 
in  which  each  commodity  has  exactly  one  source  and  one  sink. 


Theorem  1.0.1  For  any  f  >  0,  an  e-optimal  solution  for  the  simple  concurrent  flow  problem 
can  be  found  by  a  randomized  algorithm  in  0{e~^kmn\ogk\og^  n)  time  and  by  a  deterministic 
algorithm  in  0(€“^lr^Tnnlogfclog®n)  time. 

Our  expected  running  time  is  the  same  (up  to  polylog  factors)  as  the  time  needed  to 
compute  k  maximum-flows,  thus  giving  the  surprising  result  that  approximately  computing 
a  A:-commodity  concurrent  flow  is  about  as  difficult  as  exactly  computing  k  single-commodity 
maximum  flows.  In  fact,  we  formally  prove  that  a  fc-commodity  concurrent  flow  problem  can  be 
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approximately  solved  by  approximately  solving  0(fclogA:logn)  minimum-cost  flow  problems. 

The  only  previously-known  algorithms  for  solving  the  general  concurrent  flow  problem  use 
linear  programming.  The  concurrent  flow  problem  can  be  formulated  as  a  linear  program 
in  0(mk)  variables  and  0(nk  -I-  m)  constraints.  Any  polynomial-time  linear  programming 
algorithm  can  be  used  to  solve  the  problem  optimally.  Kapoor  and  Vaidya  [30]  gave  a  method 
to  speed  up  the  matrix  inversions  involved  in  Karmarkar-type  algorithms  for  multicommodity 
flow  problems;  combining  their  technique  with  Vaidya ’s  linear  programming  algorithm  that  uses 
fast  matrix  multiplication  [67]  yields  a  time  bound  of  ®n®m  ®log(nZ)t/))  for  the  concurrent 
flow  problem  with  integer  demands  and  an  0(k^  ^n^m^log(n€~^DU))  time  bound  to  find  an 
approximate  solution.  When  f  is  not  too  small,  for  example  when  c  =  0(1),  the  running  time 
of  our  algorithm  is  faster  for  all  possible  instances  of  a  simple  concurrent  flow  problem. 

Before  continuing,  we  emphasize  the  difference  between  approximation  algorithms  and  the 
common  approach  of  solving  problems  through  the  use  of  heuristics.  Heuristics  are  procedures 
that  are  applied  when  it  is  deemed  impractical  to  use  an  algorithm  that  always  finds  the  optimal 
solution.  Heuristics  typically  run  much  faster  than  algorithms  that  find  optimal  solutions,  and 
while  they  may  often  find  solutions  that  are  optimal  or  very  close  to  optimal,  there  are  no 
guarantees  on  the  quality  of  the  solution  found.  In  contrast,  the  algorithms  in  this  thesis  all 
come  with  guarantees. 

Our  approach  to  solving  the  concurrent  flow  problem  can  be  easily  understood  in  pseudo- 
economic  terms.  The  essential  complexity  of  the  problem  arises  because  the  different  commodi¬ 
ties  are  all  competing  for  the  same  scarce  resource,  the  capacities  of  the  edges.  In  order  to 
model  this  process,  we  introduce  a  pricing  scheme.  Consider  a  particular  flow,  say  the  one 
that  appears  in  Figure  1.2.  We  introduce  a  price  on  each  edge,  to  represent  the  congestion.,  or 
percentage  utilization  of  that  edge.  If  an  edge  is  heavily  congested,  it  has  a  high  price,  and  if 
an  edge  is  lightly  congested  it  has  a  low  price.  For  example,  edge  ViWa,  which  has  congestion 
2/2  =  1,  has  a  higher  price  than  edge  which  has  congestion  1/4.  Once  we  have  these 
prices,  a  particular  routing  for  a  commodity  has  a  cost,  which  is  based  on  the  prices  of  the 
edges  that  the  commodity  is  using.  Consider  commodity  3.  It  sends  flow  over  the  two  edges 
with  maximum  congestion,  i.e.,  the  two  highest  priced  edges.  A  cheaper  way  to  send  its  flow 
might  be  over  the  bottom  path  Vir2V4t'5.  Our  algorithm  recognizes  this  situation  and  reroutes 
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some  of  the  flow  of  commodity  3  off  its  current  path  and  onto  the  path  ViV2V4r5.  For  example, 
if  half  the  flow  were  rerouted,  we  would  obtain  the  flow  depicted  in  Figure  1.3.  After  rerouting, 
the  congestion  of  edges  change,  and  hence  the  prices  change. 

While  this  description  is  a  highly  simplified  version  of  our  algorithm,  it  does  capture  the 
essential  ideas.  The  key  to  the  efficiency  of  our  algorithm  is  twofold.  First,  we  can  phrase 
this  question  of  finding  a  cheap  way  to  route  flow  as  a  minimum-cost  flow  problem,  which 
is  a  “well-solved”  problem.  Second,  we  can  show  that,  for  the  right  choice  of  parameters,  a 
rerouting  procedure  does  not  have  to  be  executed  too  many  times.  This  is  the  difficult  part 
of  the  analysis.  We  need  to  show  that  every  iteration  of  our  algoiithm  makes  progress,  for 
some  suitably  defined  notion  of  progress.  We  also  need  to  be  able  to  detect  when  our  solution 
is  (-optimal.  Note  that  we  are  requiring  that  we  can  detect  when  our  solution  is  within  a 
( 1  -I-  ()  factor  of  optimal,  without  knowing  what  the  optimal  value  is.  To  do  this,  we  develop  a 
notion  of  relaxed  optimality  and  a  detection  scheme  that  uses  suitably  relaxed  versions  of  the 
complementary  slackness  conditions  of  linear  programming. 

A  detailed  description  of  this  algorithm,  along  with  algorithms  for  an  important  special 
case,  that  in  which  all  edges  have  capacity  1,  appears  in  Chapter  2. 

The  multicommodity  flow  problem  has  several  important  applications.  Many  classical  prob¬ 
lems  in  Operations  Research  can  be  phrased  as  miilticommodity  flow  problems,  including; 

•  telecommunications  problems, 

•  import-export  problems, 

•  freight  transport  and  scheduling, 

•  network  design, 

•  freight  assignment  in  the  less-that-truckload  trucking  industry, 

•  traffic  planning,  and 

•  busing  students  to  schools. 

Multicommodity  flow  can  also  be  used  to  find  good  separators  for  graphs,  yielding  divide- 
and-conquer  algorithms  for  several  A'^F-hard  graph  problems.  In  particular,  the  results  in  this 
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thesis  can  be  used  to  give  the  fastest  polylogarithmic  approximations  to  a  number  of  problems 
including: 

•  VLSI  channel  routing, 

•  minimum  cut  linear  arrangement, 

•  minimum  area  layout, 

•  V>* bifur cators  of  a  graph, 

•  minimum  feedback-arc  set, 

•  graph  embedding  problems, 

•  chordalization  of  a  graph, 

•  register  sufficiency, 

•  minimum  deletion  of  clauses  in  a  2CNF  =  formula, 

•  via  minimization,  and 

•  the  edge-deletion  graph  bipartization  problems. 

These  problems  will  be  discussed  in  Chapter  3.  In  addition  we  will  show  how,  in  some  cases,  our 
algorithms  can  be  adapted  to  find  inh^  nl  solutions.  The  ability  to  do  so  is  significant,  since 
the  integral  multicommodity  flow  problem  is  likely  to  be  more  difficult  than  the  problem  of 
finding  an  optimal  flow  that  is  not  necessarily  integral:  the  former  problem  is  NP-hard  whereas 
the  latter  is  solvable  in  polynomial  time  via  linear  programming. 

Not  only  do  our  algorithms  have  provably  efficient  running  times,  but  they  perform  well  in 
practice.  In  Chapter  4  we  discuss  an  implementation  of  one  variant  of  the  algorithm  presented 
in  Chapter  2.  The  results,  while  preliminary,  are  rather  encouraging.  For  large  problems  our 
implementation  significantly  outperforms  a  good  simplex-based  linear  programming  code.  In 
fact,  we  have  been  able  to  solve  problems  that  are  larger  than  those  that  can  be  solved  by  good 
simplex-based  codes.  In  particular,  we  are  able  to  solve  problems  in  which  there  are  a  large 
number  of  commodities. 
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In  Chapter  5,  we  turn  to  the  problem  of  shop  scheduling.  We  give  the  first  polylogarithmic 
approximation  algorithms  for  the  job-shop  problem,  flow-shop  problem,  and  several  extensions. 
Our  algorithms  are  randomized  and  combine  techniques  from  two  seemingly  disparate  fields  of 
study:  vector-sum  theorems  and  packet  routing  algorithms. 

In  Chapter  6,  we  show  how  to  make  the  shop  scheduling  algorithms  deterministic.  Our 
algorithm  makes  use  of  some  recent  extensions  of  our  multicommodity  flow  techniques  and 
unifies  many  of  the  ideas  in  this  thesis,  since  it  is  necessary  to  find  an  approximately  optimal 
integral  solution  to  a  generalized  version  of  the  multicommodity  flow  problem.  The  algorithms 
we  use  are  closely  related  to  those  used  for  the  multicommodity  flow  problem. 

Throughout  this  thesis,  we  assume  familiarity  with  the  basic  concepts  of  linear  and  integer 
programming.  While  none  of  our  algorithms  actually  rely  on  a  procedure  for  linear  program¬ 
ming,  some  of  the  proofs  rely  on  well-known  results  about  bnear  programming.  We  refer  the 
reader  who  is  unfamibar  with  linear  programming  to  a  basic  textbook  such  as  that  of  Chvatal 
[11]  or  Schrijver  [55]. 

We  include  an  glossary  of  notation. 


Chapter  2 


Multicommodity  Flow  Algorithms* 


2.1  Introduction 

The  multicommodity  flow  problem  involves  simultaneously  shipping  several  different  commodi¬ 
ties  from  their  respective  sources  to  their  sinks  in  a  single  network  so  that  the  total  amount 
of  flow  going  through  each  edge  is  no  more  than  the  edge’s  capacity.  Associated  with  each 
commodity  is  a  demand,  which  is  the  amount  of  that  commodity  that  we  wish  to  ship.  Given 
a  multicommodity  flow  problem,  one  often  wants  to  know  if  there  is  a  feasible  flow,  i.e.,  if  it 
is  possible  to  find  a  flow  that  satisfies  the  demands  and  obeys  the  capacity  constraints.  More 
generally,  we  might  wish  to  know  the  maximum  percentage  z  such  that  at  least  z  percent  of 
each  demand  can  be  shipped  without  violating  the  capacity  constraints.  The  latter  problem 
is  known  as  the  concurrent  flow  problem,  and  is  equivalent  to  the  problem  of  determining  the 
minimum  ratio  by  which  the  capacities  must  be  uniformly  increased  in  order  to  ship  100%  of 
each  demand.  For  our  algorithms,  it  is  convenient  to  state  the  concurrent  flow  problem  in  a 
diflferent,  but  equivalent  form.  We  are  given  a  network  and  a  set  of  commodities.  Let  the 
congestion  of  an  edge  be  the  ratio  of  the  flow  on  that  edge  to  its  capacity.  We  wish  to  find  a 
way  to  route  each  commodity  so  that  the  maximum  edge  congestion  is  minimized.  We  denote 
the  value  of  the  maiximum  edge  congestion  by  A  and  the  minimum  possible  maximum  edge 
congestion  by  A*.  Our  algorithms  are  approximation  algorithms  that  find  c-optimal  solutions, 

’This  chapter  contains  joint  work  with  Tom  Leighton,  Fillia  Makedon,  Serge  Plotkin,  £va  Tardos  and  Spyroa 
Tragoudas  [42]  and  joint  work  with  Philip  Klein,  Serge  Plotkin  and  £va  Tardos  [35]. 
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ones  in  which  A  <  (1  +  f)A*. 

In  this  chapter,  we  describe  the  first  combinatorial  approximation  algorithms  for  the  con¬ 
current  flow  problem.  Given  any  positive  e,  the  algorithms  find  an  e-optimal  solution.  The 
running  times  of  the  algorithms  depend  polynomially  on  and  are  significantly  better  than 
those  of  previous  algorithms  when  e  is  a  constant.  More  specificaUy,  we  prove  the  following 
result.  Throughout,  we  use  n,  m  and  k  to  denote  the  number  of  nodes,  edges  and  commodities, 
we  assume  that  the  demands  and  the  capacities  are  integral,  and  use  D  and  U  to  denote  the 
largest  demands  and  capacities,  respectively.  We  also  assume,  for  now,  tuat  each  commodity 
has  one  source  and  one  sink.  We  refer  to  this  problem  as  the  simple  multicommodity  flow 
problem. 

Theorem  2.1.1  For  any  c  >  0,  an  c-optimai  solution  for  the  simple  concurrent  flow  problem 
can  be  found  by  a  randomized  algorithm  in  0{(~^kTnn\ogk\og^  n)  time  and  by  a  deterministic 
algorithm  in  O {€~^k^mn  log  k  log^  n)  time. 

A  complete  table  of  results  appear  at  the  end  of  this  introduction  in  Figure  ‘*1. 

Our  expected  running  time  is  the  same  (up  to  polylog  factors)  as  the  time  needed  to  com¬ 
pute  k  maximum-flows,  thus  giving  the  surprising  result  that  approximately  computing  a  k- 
commodity  concurrent  flow  is  about  as  difficult  as  computing  k  single  commodity  maximum- 
flows.  In  fact,  we  formally  prove  that  an  instance  of  a  l:-commodity  flow  problem  can  be 
approximately  solved  by  approximately  solving  0(fclogI:logn)  minimum-cost  flow  problems. 

The  running  times  in  the  above  theorem  can  be  improved  when  k  is  large.  Let  k’  denote 
the  number  of  different  sources.  In  both  the  randomized  and  the  deterministic  algorithm  we 
can  replace  k  in  the  running  time  by  k*  at  the  expense  of  having  to  replace  one  of  the  log  n 
terms  by  a  log(nt/).  Notice  that  k‘  is  at  most  n  for  all  simple  multicommodity  flow  problems. 

As  a  consequence  of  our  approximation  algorithm  for  the  concurrent  flow  problem,  we  obtain 
a  relaxed  decision  procedure  for  multicommodity  flow  feasibility;  that  is,  given  an  instance  of 
the  multicommodity  flow  problem,  we  can  either  prove  that  it  is  infeasible,  or  give  a  feasible 
flow  for  the  problem  in  which  the  capacity  of  each  edge  increased  by  a  factor  of  1  -f-  c.  Since 
in  practice,  the  input  to  a  multicommodity  flow  problem  may  have  some  measurement  error, 
by  making  c  small  enough,  we  can  obtain  a  procedure  for  determining  feasibility  up  to  the 
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precision  of  the  input  data. 

An  important  special  case  of  the  concurrent  flow  problem  occurs  when  all  edge  capacities  are 
1.  For  this  special  case,  we  can  ^ve  even  faster  algorithms.  The  algorithms  for  the  case  when 
the  edge  capacities  are  1  solve  a  series  of  shortest  path  problems  instead  of  a  series  of  minimum- 
cost  flow  problems.  The  shortest-path  variant  of  the  algorithm  performs  more  iterations  than 
minimum-cost  flow  version,  but  each  iteration  of  the  shortest  path  variant  runs  in  less  time  than 
the  minimum-cost  flow  based  variant.  In  some  cases,  the  shortest-path  based  algorithm  yields 
faster  algorithms.  Historically,  this  shortest-path  variant  for  the  unit-capacity  case  preceded 
the  general  minimum-cost  flow  based  variant.  In  fact,  the  original  version  of  the  algorithm 
for  the  generad  case  used  a  series  of  shortest  path  computations,  rather  than  minimum-cost 
flow  computations.  In  spite  of  the  fact  that  a  minimum-cost  flow  can  be  found  via  a  series 
of  shortest  path  computations,  by  doing  the  minimum-cost  flow  computations  directly,  we  are 
able  to  obtain  faster  algorithms. 

The  only  previously  known  algorithms  for  solving  (or  approximately  solving)  the  general 
concurrent  flow  problem  use  linear  programming.  The  concurrent  flow  problem  can  be  for¬ 
mulated  as  a  linear  program  in  0{mk)  variables  and  0{nk  -f-  m)  constraints.  Any  polyno¬ 
mial  time  linear  programming  algorithm  can  be  used  to  solve  the  problem  optimally.  Kapoor 
and  Vaidya  [30]  gave  a  method  to  speed  up  the  matrix  inversions  involved  in  Karmarkar-type 
algorithms  for  multicommodity  flow  problems.  Combining  their  technique  with  Vaidya’s  lin¬ 
ear  programming  algorithm  that  uses  fast  matrix  multiplication  [67]  yields  a  time  bound  of 
0()t^  ®n^m  ®log(nDt/))  to  obtain  an  optimal  solution  to  the  concurrent  flow  problem  with  in¬ 
teger  demands  and  an  0{k^^n^m^\o${n(~^ DU))  time  bound  to  find  an  approximate  solution. 

The  only  previous  combinatorial  polynomial  approximation  algorithms  for  concurrent  flow 
problems  only  handle  the  special  case  when  all  the  capacities  are  1.  For  this  special  case, 
Shahrokhi  and  Matula  [59]  gave  an  algorithm  that  ran  in  0(e~^nm^)  time.  Our  algorithm  is 
based  on  this  work,  so  we  describe  the  basic  ideas  here. 

The  algorithm  starts  by  finding  a  flow  that  satisfies  the  demands  but  not  the  capacity 
constraints.  The  algorithm  then  repeatedly  reroutes  flow  so  as  to  decrease  the  maximum  flow 
on  any  edge.  To  guide  the  rerouting,  they  assign  lengths  to  each  edge  and  then  reroute  flow  off 
a  path  that  is  long  with  respect  to  those  lengths  onto  one  that  is  short  with  respect  to  these 
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lengths. 

In  the  unit  capacity  case,  our  approach  differs  from  that  of  Shahrokhi  and  Matula  in  several 
ways.  We  develop  a  framework  of  relaxed  optimality  conditions  that  allows  us  to  measure  the 
congestion  on  both  a  local  and  a  global  level,  thereby  giving  us  more  freedom  in  choosing  which 
flow  paths  to  reroute  at  each  iteration.  We  exploit  this  freedom  by  using  a  faster  randomized 
method  for  choosing  flow  paths.  In  addition,  this  framework  also  allows  us  to  achieve  greater 
improvement  as  a  result  of  each  rerouting. 

In  the  general  case,  we  must  first  develop  the  appropriate  framework  to  handle  general 
capacities.  We  will  develop  more  general  relaxed  optimality  conditions.  Also,  we  are  able  to 
reroute  an  entire  commodity  during  each  iteration  instead  of  only  a  single  path  of  flow.  To  do 
this  rerouting  ,  we  compute  a  minimum-cost  flow  in  an  auxiliary  graph  and  reroute  a  portion 
of  the  flow  accordingly.  As  a  consequence,  we  are  able  to  make  much  greater  progress  during 
each  iteration.  Of  course,  the  time  to  run  each  iteration  goes  up,  but  the  tradeoff  proves  to  be 
worthwhile  since  the  improvement  obtained  in  each  iteration  is  large  enough  so  that  we  need 
to  solve  only  O(itlog^Iogn)  minimum-cost  flow  problems  in  order  to  get  an  approximately 
optimal  solution. 

The  running  times  of  the  presented  algorithms  depend  polynomially  on  c“*.  The  determinis¬ 
tic  algorithm  runs  in  time  proportional  to  and  the  randomized  one  runs  in  time  proportional 
to  €“^.  For  the  randomized  algorithm,  Goldberg  [20]  and  Grigoriadis  and  Khachiyan  [26]  have 
shown  how  to  improve  the  dependence  on  c  of  the  randomized  algorithm  to  c"*. 

Our  model  of  computation  is  the  RAM.  We  shall  use  the  elementary  arithmetic  operations 
(addition,  subtraction,  comparison,  multiplication,  and  integer  division),  and  count  each  of 
these  as  a  single  step.  All  numbers  occurring  throughout  the  computation  will  have  O(log(nl/)) 
bits.  For  ease  of  exposition  we  shall  first  use  a  model  of  computation  that  allows  exact  arithmetic 
on  real  numbers  and  assumes  that  exponentiation  is  a  single  step.  We  then  show  how  to  convert 
the  results  to  the  usual  RAM  model. 


2.2.  PRELIMINARIES 


13 


Scenario 

Running  Time 

General,  randomized 

0  (rnnk(f~^  -f  logi)  min  |log  ,loglogn|  lognj 

0  fmnk'(f~^  -1-  log  1*) min  1  log  ,  loglog(nt/)  j  log(n[/)) 

General,  deterministic 

0  fmnk^(e~^  -1-  logi;)  min  |log  ,  log  log  n|  lognj 

O  (mni*’(€"’-l-Iogifc*)min|log  ,  logIog(nt/)|  log(nf/)) 

Unit  capacity,  randomized 

0((if“*  +  me~^  log  n)(m  +  nlogn)) 

0(*m3/2log’n(f-3  +  logfc)) 

Unit  capacity,  deterministic 

0(((ir  +  c~’m)  log n)(i:*nlogn  m(logn  -t-  min{i:,  fc*(logdn>ax  +  1)}))) 
O(i*m^/^log*  n(f“*  -i-  logl)) 

Unit  capacity,  c  =  0(1) 
randomized 

0(m(t  +  m)logn) 

Unit  capacity,  f  =  0(1) 
deterministic 

0(m(ir  -f  m)(logn  -f  min  {k,  A:*(log dm«*  +  1)}  logo)) 

Unit  capacity,  unit  demand, 
f  =  0(1),  randomized 

0(nilognIogA:(m-f-  nlogn  klogk)) 

Figure  2.1:  Some  of  our  running  times  for  the  multicommodity  flow  problem.  The  bounds 
are  for  an  n-node,  m-edge  graph  with  maximum  edge  capacity  U  and  maximum  demand  dm**- 
There  are  k  commodities  and  k*  distinct  sources. 

2.2  Preliminaries 

Throughout  this  chapter  we  use  the  notation  A*,  where  A  G  {R,  Z,R.^,  Z^.}  and  S  €  {V,E} 
to  denote  an  |51-dimensional  vector  in  which  each  element  is  a  member  of  the  set  A.  The 
component  of  /?  €  A^  corresponding  to,  say  node  v,  is  denoted  by  0{v). 

An  instance  I  =  (G,  u,  AC)  of  the  simple  multicommodity  flow  problem  consists  of  an  undi¬ 
rected  graph  G  =  (V,  £)  with  vertex  set  V  and  edge  set  £,  a  capacity  vector  u  G  Rf,  and 
a  specification  AC  of  A:  commodities,  numbered  1  through  k,  where  the  specification  for  com¬ 
modity  t  consists  of  a  source- sink  pair  s,,  t,  G  V  and  a  non-negative  demand  d,  .  We  denote  the 
number  of  distinct  sources  by  k’ ,  the  number  of  nodes  by  n,  and  the  number  of  edges  by  m. 
For  notational  convenience  we  assume  that  m  >  n,  and  that  the  graph  G  is  connected  and  has 
no  parallel  edges.  Also,  for  notational  convenience,  we  arbitrarily  direct  each  edge.  If  there  is 
an  edge  directed  from  t;  to  w,  this  edge  is  unique  by  assumption,  and  we  denote  it  by  vw.  We 
assume  that  the  capacities  and  the  demands  are  integral,  and  denote  the  largest  capacity  by  U 
and  the  sum  of  the  demands  by  D. 
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A  multicommodity  flow  /  consists  of  k  vectors  /<,  i  =  1, . . . ,  ifc,  where  /j  6  R®.  The  quantity 
fiivw)  represents  the  flow  of  commodity  t  on  edge  vw.  K  the  flow  of  commodity  i  on  edge  vw 
is  oriented  in  the  same  direction  as  edge  vw,  then  fi{vw)  is  positive,  otherwise  it  is  negative. 
The  signs  only  serve  to  indicate  the  direction  of  the  flows.  For  each  commodity  i  we  require 
the  conservation  constraints: 


V  fi(wv)-  'V  fiivw)  =  0 

for  each  node  v  ^  {s,,  t^}. 

(2.1) 

V  fiivw)  =  di 

for  V  =  Sj , 

(2.2) 

vw^E 

fiiwv)  =  di 

for  V  =  t,. 

(2.3) 

Note  that  (2.1)  and  (2.2)  imply  (2.3).  Alternatively,  we  can  define  the  flow  of  a  commodity 
in  the  following  way.  Let  Vi  denote  a  collection  of  paths  from  Si  to  ti  in  G,  and  let  /<(/*)  be 
a  nonnegative  value  for  every  path  P  in  Vi  that  represents  the  amount  of  flow  carried  by  path 
P.  The  value  of  the  flow  thus  defined  is  ]2p€P.  which  is  the  total  flow  delivered  from  Si 

to  ti.  The  amount  of  flow  through  an  edge  vw  is 

/,(rw)  =  ^  {fi{P)  :  P  €Vi  and  vw  €  P}  . 

We  will  use  both  formulations  as  convenient. 

W'e  define  the  value  of  the  total  flow  on  edge  vw  to  be  /(vw)  =  J2i  say  that  a 

multicommodity  flow  /  in  G  is  feasible  if  f(vw)  <  u(vw)  for  all  edges  vw.  (Note  that  f{vw)  is 
always  non -negative.) 

We  consider  the  optimization  version  of  this  problem,  called  the  simple  concurrent  flow 
problem,  first  defined  by  Shahrokhi  and  Matula  [59].  In  this  problem  the  objective  is  to  compute 
the  maximum  possible  value  z  such  that  there  is  a  feasible  multicommodity  flow  with  demands 
^  •  di  for  1  <  t  <  k.  We  call  z  the  throughput  of  the  multicommodity  flow.  An  equivalent 
formulation  of  the  concurrent  flow  problem  is  to  compute  the  minimum  X  —  1/z  such  that 
there  is  a  feasible  flow  with  demands  di  and  capacities  A  •  ii(vw).  We  shall  use  the  notation 
X(vw)  to  denote  the  congestion  f{vw)fu{vw)  of  an  edge  vw  ^  E,  X  =  max^^^gt  A(rw),  and  A* 
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to  denote  the  optimal  (minimum)  value  of  A.  We  can  now  restate  the  concurrent  flow  problem 
as  follows: 

Simple  Concurrent  Flow  Problem  (restatement)  Given  an  instance  J  of  the  multicom¬ 
modity  flow  problem,  find  a  flow  that  satisfies  the  conservation  constraints  (2.1)-(2.3)  and 
minimizes  the  congestion  A. 

A  multicommodity  flow  /  satisfying  the  demands  d,,  i  =  1, . .  .,il;  is  e-optimal  if  its  conges¬ 
tion  A  is  at  most  a  factor  (1  -I-  c)  more  than  the  minimum  possible  value;  that  is  A  <  (1  -h  c)A*. 
The  approximation  problem  associated  with  the  concurrent  flow  problem  is  to  find  an  (-optimal 
multicommodity  flow  /.  We  shall  assume  implicitly  throughout  that  c  is  at  least  inverse  poly¬ 
nomial  in  n  and  is  at  most  1.  This  assumption  is  without  loss  of  generality.  If  e  >  1,  we  can  run 
the  algorithm  for  c  =  1.  If  is  greater  than  any  polynomial  in  n,  our  algorithms  still  yield 
a  correct  solution.  In  this  case,  however,  the  running  times  of  our  algorithms  are  somewhat 
greater  and  will  be  dominated  by  the  time  to  solve  the  problem  exactly. 

We  can  extend  the  results  in  this  chapter  to  the  case  where  the  input  graph  is  directed.  In 
this  case  we  require  all  edge  flows  to  be  non-negative  and  oriented  in  the  same  direction  as  the 
corresponding  edges  in  the  input  graph.  The  results  in  this  chapter  carry  through  to  this  case 
with  slight  notational  changes.  Henceforth,  we  focus  only  on  the  undirected  case. 

The  general  multicommodity  flow  problem  is  a  natural  extension  of  the  simple  problem 
when  each  commodity  may  have  more  than  one  source  and  sink.  For  each  commodity  i  we 
are  given  an  n-dimensional  demand  vector  dj  €  Z’',  where  the  component  d,(V;)  denotes 
the  demand  for  commodity  i  at  node  Vj.  A  negative  demand  denotes  a  supply.  We  require 
that  the  total  demand  equal  the  total  supply,  i.e.,  =  0  A  to  denote 

max,,  ||dj(v)||.  The  conservation  constraints  of  equations  (2.1)-(2.3)  are  replaced  by  the  more 
general  conservation  constraints: 


fiivw)  =  di{v)  1=1,...,*:;  v£V.  (2.4) 

vw^E 

Many  of  our  results  can  be  extended  to  this  slightly  more  general  model,  although  we  shall 
not  address  this  issue  in  this  thesis.  The  main  point  in  introducing  this  model  is  to  reduce  the 
number  of  commodities.  We  will  show  that  every  simple  concurrent  flow  problem  is  equi\‘alent 
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commodity  s _ t  demand 

1  V,  Vj  1 

2  V2  V4  1 

3  V,  vj  2 

4  ''1  ''2  ^ 

3i:vi)  2(yi)  SivA)  _ gfntesl 

1  -41102' 

2  0-1010 

Figure  2.2:  The  original  input,  the  grouped  input  and  a  solution  to  the  grouped  problem. 

to  a  general  concurrent  flow  problem  with  at  most  n  commodities. 

For  a  general  concurrent  flow  problem,  it  may  not  be  possible  to  reduce  the  number  of  com¬ 
modities.  To  simplify  the  running  time  bounds,  we  will  assume  that  the  number  of  commodities 
is  polynomial  in  n.  In  particular,  we  will  use  that  logfc  =  O(logn). 

We  now  explain  how  to  convert  a  simple  concurrent  flow  problem  to  a  general  concurrent 
flow  problem  with  k"  commodities,  where  k'  is  the  number  of  distinct  sources:  we  combine 
those  commodities  that  share  a  source.  In  other  words,  for  each  source  s  we  define  a  demand 
vector  d,  €  as  follows:  for  each  commodity  t  with  s,  =  s,  we  set  =  d*;  we  set 

d,{s)  =  -  53  and  we  set  all  other  demands  to  zero. 

We  give  an  example  of  combining  and  uncombining  flows  in  Figures  2.2  and  2.3.  In  Figure 
2.2,  the  input  is  a  4  commodity  simple  concurrent  flow  problem.  Commodities  1,2  and  4  all 
have  vi  as  a  source.  Hence  we  can  combine  these  3  commodities  into  1  commodity  group,  group 
1.  The  demand  vector  for  this  group  appears  in  the  second  table.  Node  Vj  is  a  supply  node 
with  4  units  of  supply,  and  hence  d(r)  =  -4.  Nodes  V3  and  V3  have  demand  1  and  node  Vj  has 
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commodity  s _ l_ 

1  V,  Vj 

2  V2  V4 

3  vi  Vj 

4  vj  V2 

Figure  2.3;  The  result  of  ungrouping  the  solution  in  Figure  2.2 

demand  2.  Node  v^,  which  had  no  demand  in  any  of  the  original  commodities,  has  no  demand 
in  the  grouped  commodity.  The  second  commodity  group  consists  of  the  original  commodity  2. 
A  solution  for  this  grouped  commodity  is  given  in  the  graph.  It  is  easy  to  check  the  demands 
for  the  two  commodity  groups  are  satisfied.  Figure  2.3  shows  how  to  convert  the  solution  for 
the  grouped  instance  in  Figure  2.2  into  one  for  the  original  instance.  Commodity  group  1  has 
been  split  into  three  commodities.  The  total  amount  of  flow  on  each  edge  is  still  the  same  and 
all  the  original  demands  are  still  satisfied. 

Lemma  2.2.1  Consider  a  simple  /;-commodity  concurrent  flow  problem  and  the  corresponding 
ib*-commodity  problem  defined  by  combining  commodities  that  share  a  source. 

1.  Given  the  ungrouped  problem,  the  grouped  problem  can  be  created  in  0{kn)  time. 

2.  Any  feasible  solution  to  one  can  be  converted  to  a  solution  to  the  other  with  the  same  con¬ 
gestion. 

3.  The  conversion  of  a  solution  for  the  I;*-commodity  grouped  problem  to  one  for  the  A;-commodity 
ungrouped  problem  can  be  done  in  0{k*nm)  time,  or  in  O(l;*mlogri)  time  using  the  dynamic 
tree  data  structure. 


Proof :  The  conversion  of  an  instance  of  the  grouped  problem  from  an  ungrouped  one  can  be 
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performed,  in  0(kn)  time,  by  the  procedure  described  above  for  combining  commodities  that 
share  a  source.  The  conversion  of  a  solution  of  the  simple  concurrent  flow  with  k  commodities 
into  a  solution  of  the  l;*-commodity  problem  is  straightforward,  we  simply  add  the  flows  that 
share  a  common  source.  Assume  that  we  are  ^ven  a  solution  to  the  general  concurrent  flow 
problem  with  k*  commodities.  Decompose  the  flow  of  each  commodity  into  paths  and  cycles 
and  combine  the  flows  on  paths  that  have  the  same  source  and  sink  nodes,  disregarding  the 
cycles.  This  procedure  is  known  as  flow  decomposition  and  it  is  well  known  how  to  compute  a 
decomposition  in  0(nm)  time  (see,  for  example  [3])  and  in  O(mlogn)  time  using  the  dynamic 
tree  data  structure.  [64]  ■ 

The  sources  and  sinks  play  a  symmetric  role  m  the  (undirected)  problem,  and  hence  k’ 
in  the  lemma  could  have  been  defined  as  the  number  of  nodes  in  any  subset  that  contains 
an  endpoint  of  each  commodity.  While  iinding  a  minimum  such  node  set  is  NP-complete,  we 
mention  this  formula*  on  because  in  some  cases  it  leads  to  an  efficiently  computable  k*  that  is 
smaller  than  the  one  defined  above. 

Except  for  the  few  places  i”  this  chapter  where  we  explicitly  distinguish  between  simple 
and  n  .  i’  simple  concurrent  flow  problems,  all  our  bounds  are  for  a  )t*commodity  non-simple 
CO*,  ^rre  *ow  proc.em,  and  hence  they  also  apply  to  a  ••-commodity  simple  concurrent  flow 
problem.  I'he  only  distinction  between  the  two  variants  will  be  made  in  the  routine  Initialize 
and  its  analysis  in  Lemma  2.4.2,  and  in  the  final  analysis  of  our  algorithms  in  Theorems  2.4.21, 
■‘.23  and  2.4.32. 

The  main  subroutine  of  our  algorithm  is  a  minimum-cost  flow  computation  (of  a  single  com¬ 
modity).  We  use  the  following,  slightly  unconventional  definition.  An  instance  of  a  minimum- 
cost  flow  problem  M  =  (G,  u,  c,  d^)  consists  of  a  graph  G  =  (V,  E)  with  edge  capacities  u  £  R®, 
edge  costs  c  €  R^  and  a  demand  vector  d,.  The  cost  Ci  of  a  flow  /,  is  II„ur€E 
Given  a  demand  vector  di(v),  and  capacities  u,  the  minimum-cost  flow  problem  is  the  prob¬ 
lem  of  finding  a  flow  of  minimum  cost  that  satisfies  the  conservation  constraints  (2.4)  and  has 
|/,(t;u;)|  <  u(vw)  for  each  edge  vw  €  E.  We  denote  the  value  of  the  minimum-cost  flow  by 
C‘.  The  residual  graph  of  a  flow  /,,  denoted  Gj^  =  {V,Ej^)  is  the  directed  graph  consisting 
of  the  set  of  edges  for  which  fi(viv)  <  u(vw)  and  the  reversal  of  the  set  of  edges  for  which 
fi(vw)  >  -u(vw).  In  Section  3.3.3,  we  will  need  to  work  with  the  linear- programming  dual  of 
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a  minimum-cost  flow.  The  dual  variables  on  the  nodes  are  commonly  referred  to  as  prices.,  and 
are  denoted  by  p.  A  price  function  is  a  vector  p  €  .  The  reduced  cost  of  an  edge  vw  £  E  is 

c(vw)  -f  p(v)  -  p(w),  and  -  p(v)  +  p(in)  on  reverse  edges.  Linear  programming  duality 

implies  that  a  flow  /,  is  of  minimum  cost  if  and  only  if  there  exists  a  price  function  p,  such 
that  the  reduced  cost  of  the  edges  in  the  residual  graph  of  ft  are  nonnegative  (complementary 
slackness  conditions). 

For  initialization,  we  will  need  to  solve  maximum  flow  problems.  We  use  the  following, 
slightly  unconventional  deflnition.  An  instance  of  a  maximum  flow  problem  M  =  (G,u,Si,ti) 
is  a  graph  G  =  (V,  E)  with  edge  capacities  u  £  R^,  and  two  distinguished  nodes,  the  source  s, 
and  the  sink  t,  .  The  maximum  flow  problem  is  the  problem  of  finding  the  maximum  value  d, 
such  that  there  exists  a  flow  /<  £  R^  that  satisfies  the  conservation  constraints  (2.1)-(2.3)  and 
has  |/,(t;u;)|  <  u{vw)  for  each  edge  vw  £  E. 

We  will  also  need  to  solve  a  variant  of  the  maximum  flow  problem  that  we  call  the  feasible 
flow  problem.  The  input  to  a  feasible  flow  problem  T  =  (G,  u,  di)  consists  of  a  graph  G  =  (V,  E) 
with  edge  capacities  u  and  a  demand  vector  d,.  The  object  is  to  find  a  flow  /,  satisfying  the 
conservation  constraints  (2.4)  and  that  has  |/,(wu;)|  <  u{vw)  for  each  edge  vw  £  E.  It  is  well- 
known  how  to  convert  an  instance  E  of  the  feasible  flow  problem  with  n  nodes  and  m  edges  into 
an  instance  M  of  the  maximum  flow  problem  with  at  most  n  -|-  2  nodes  and  m  -I-  2ti  edges.  Thus 
both  the  maximum  flow  problem  and  the  feasible  flow  problem  can  be  solved  by  a  maximum 
flow  computation  on  a  graph  with  0{n)  nodes  and  0{m)  edges. 

2.2.1  Optimality  Conditions 

Linear  programming  duality  can  also  be  used  to  give  a  characterization  of  the  optimum  solution 
for  the  concurrent  flow  problem.  Let  (  £  R^  be  a  nonnegative  length  function.  For  nodes 
v,w  £  V^,  let  distt{v,w)  denote  the  length  of  the  shortest  path  from  v  to  tr  in  G  with  respect 
to  the  length  function  t.  The  following  theorem  is  a  special  case  of  the  Unear  programming 
duality  theorem. 

Theorem  2.2.2  For  a  simple  multicommodity  flow  /  satisfying  the  demands  d,,  i  =  1,...,^ 
and  capacities  A  •  u{vw),  \vw  £  E,  and  any  length  function  t, 
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X  ^  /(t?iu)u(vu?)  > 

vm^E 


> 


2  <(.•»)/(«•») 

v«iE£ 

•si vweE 

•=1  P€V. 
k 

Y^distt{si,ii)di. 

•al 


(2.5) 


Furthermore,  a  multicommodity  flow  /  minimizes  A  if  and  only  if  there  exists  a  nonzero  length 
function  t  for  which  the  inequalities  above  all  hold  with  equality. 


Theorem  2.2.2  is  a  characterization  of  optimality  that  relates  the  value  of  A  to  the  lengths 
of  the  shortest  path  for  each  commodity.  We  shall  also  use  a  slightly  different  characterization, 
one  that  relates  the  value  of  A  to  the  costs  of  minimum-cost  flows  in  appropriately  derived 
graphs.  While  these  characterizations  can  be  proven  to  be  equiv-alent,  by  measuring  optimality 
in  terms  of  minimum-cost  flows,  we  are  able  to  develop  faster  algorithms  for  the  general  case. 

Let  /  be  a  nonnegative  length  function  on  the  edges,  /  a  multicommodity  flow,  and  A 
its  congestion.  Let  Ci  be  the  cost  of  the  current  flow  for  commodity  i,  using  C  as  the  cost 
function,  i.e.,  C,  =  ^(vtn)  |/,(vtn)l.  For  a  commodity  i,  let  C’{X)  be  the  value  of  a 

minimum-cost  flow  /,*  satisfying  the  demands  of  commodity  i,  subject  to  costs  t  and  capacities 
A  •  u{vw),  i.e.,  let  /*  be  a  flow  that  satisfies  |/*(vu))l  <  A  •  ti(vu;)  and  minimizes  the  cost 

For  brevity  we  shaU  sometimes  use  Q  to  abbreviate  C,*(A). 

The  following  theorem  is  a  restatement  of  Theorem  2.2.2. 
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Theorem  2.2.3  For  a  (general)  multicomnKxJity  flow  /  satisfying  capacities  A  •  u(vu7),  and  a 
length  function  t, 

A  ^2  tivw)u(vw)  >  ^  t{vw)f{vw) 

uwiE  vw€E 

•=1 vwiE 

=  tc. 

t=l 

>  EC,-(A).  (2.6) 

«=1 

Furthermore,  a  multicommodity  flow  /  minimizes  A  rf  and  only  if  there  exists  a  nonzero  length 
function  i  for  which  the  inequalities  above  all  hold  with  equality. 

We  would  like  to  be  able  to  say  that  the  ratio  of  the  last  term  and  the  multiplier  of  A  in 
the  first  term  gives  a  lower  bound  on  the  optimal  value  A*.  The  analogous  statement  for  the 
inequality  (2.5)  is  obvious,  because  neither  of  the  two  terms  depend  on  A.  In  Theorem  2.2.3 
the  last  term,  C’(A),  depends  on  A.  Observe,  however,  that  the  minimum  cost  of  a  flow 
subject  to  capacity  constraints  A  •  u(t;u;)  cannot  increase  if  A  increases. 

Lemma  2.2.4  Suppose  that  we  have  a  multicommodity  flow  satisfying  capacities  A  •  u(vw)  and 
^  is  a  length  function.  Then  A*  >  C',*(A)/(53i;tt,6£:^(vti')u(rtt7)). 

Another  well-known  characterization  of  optimality  for  a  linear  program  is  known  as  the  com¬ 
plementary  slackness  conditions.  One  way  to  formulate  these  conditions  for  multicommodity 
flow  is  to  formulate  them  as  conditions  on  edges  and  individual  commodities. 

Theorem  2.2.5  A  multicommodity  flow  /  has  minimum  A  if  and  only  if  there  exists  a  nonzero 
length  function  i  such  that 

1.  for  each  edge  vw  €  E,  either  t{vw)  =  0  or  f{vw)  =  A  •  tt(t;tr),  and 

2.  for  each  commodity  i,  Ci  =  C,*(A). 
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Figure  2.4:  An  optimal  solution 


The  complementary  slackness  conditions  can  also  be  formulated  in  terms  of  conditions  on 
edges  and  paths.  The  following  theorem  is  equivalent  to  the  definition  above,  but  will  turn  out 
to  be  more  useful  in  the  unit  capacity  case. 

Theorem  2.2.6  A  multicommodity  flow  /  has  minimum  A  if  and  only  if  there  exists  a  nonzero 
length  function  £  such  that 

1.  for  each  edge  vw  ^  E  either  ({vw)  =  0  or  f{vw)  =  A,  and 

2.  for  each  commodity  i  and  every  path  P  e  Vi  with  fi{P)  >  0  we  have  t{P)  = 

We  illustrate  the  concepts  of  this  section  by  giving  a  length  function  that  demonstrates  the 
optimality  of  the  flow  given  in  Figure  1.3.  In  Figure  2.4,  we  give  the  flows  and  length  functions. 
We  first  check  Theorem  2.2.3.  The  leftmost  term. 


A  ^  t{vw)u{vw)  =  i(l  •2+l-2+l-4  +  l>4  +  0-  3  +  0-3)  =  6. 
The  middle  terms, 

k  k 

t(vw)f{vw)  =  E  E  =  =  1  +  14-2  +  2  =  6. 

vitt^E  tsl  vw^E  1=1 
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Finally,  in  order  to  compute  the  last  term,  we  need  to  compute  a  minimum-cost  flow  for  each 
commodity.  The  value  of  the  minimum-cost  flow  is  equal  to  the  shortest  a,  -  U  path  for 
commodity  t  multiplied  by  the  demand  of  t.  So  C|  =  2  •  1,  C3  =0-1,  and  C3  =  2  •  2.  Summing, 
we  get  6  and  we  have  shown  that  at  optimality  the  terms  are  all  equal. 

We  can  also  check  for  optimality  using  the  complementary  slackness  conditions  of  Theorem 
2.2.5.  The  first  condition  is  easily  verified  and  the  second  was  verified  in  the  previous  paragraph. 
Hence  we  have  another  “proof”  that  the  flow  given  in  Figure  1.3  is  optimal. 

2.3  Relaxed  Optimality  Conditions 

The  conditions  in  Theorems  2.2.5  and  2.2.6  describe  when  a  solution  is  optimal.  The  goal  of 
our  algorithms,  however,  is  to  find  a  multicommodity  flow  /  and  a  length  function  I  such  that 
this  lower  bound  is  within  a  (1  -f-  c)  factor  of  optimal,  i.e., 

A  <  (1 +  0A-  <  (1  +  «)i:c;{A)/(  5;  Hv«,Mvw)). 

•=1  vwiE 

In  this  case,  we  have  proved  that  /  is  c-optimal,  and  f  is  a  particular  length  function  that  allows 
us  to  verify  c-optimality. 

Let  t  >  0  be  an  error  parameter,  /  a  multicommodity  flow  satisfying  capacities  A  •  u{vw), 
and  I  a  length  function.  We  say  that  a  commodity  i  is  €-good  if 

Ci-C-iX)  <  eC, /(vtr)«(rtu). 

Otherwise,  we  say  that  the  commodity  is  (-bad.  Intuitively,  a  commodity  is  t-good  if  it  is  almost 
as  cheap  as  the  minimum  cost  possible  for  that  commodity  or  it  is  at  most  a  small  fraction  of 
£{vw)u{vw),  the  total  cost  of  the  network.  We  use  this  notion  in  defining  a  relaxed 
version  of  the  complementary  slackness  conditions.  We  define  the  following  relaxed  optimality 
conditions  (with  respect  to  a  multicommodity  flow  /  that  satisfies  capacity  constraints  X-u{vw), 
a  length  function  £  and  an  error  parameter  f): 
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(iZl)  For  each  edge  vw  €  E, 
either 

(1  +  ()f(vw)  >  A  •  tt(vw) 


or 

u(vw)t{vw)  <  ^  53  i{xy)u{xy). 

{R2)  53 

i  <-b«d  1=1 

Typically,  complementary  slackness  conditions  are  used  as  a  way  to  check  whether  a  solution 
is  optimal.  We  will  use  the  relaxed  optimaL'ty  conditions  to  check  when  a  solution  is  c-optimal. 
We  now  show  that  if  these  two  conditions  are  satisfied  then  the  gap  between  the  first  and  last 
terms  in  (2.6)  is  small.  We  begin  by  showing  that  the  gap  between  the  first  and  second  terms 
is  small. 


Lemma  2.3.1 
Rl.  Then 


Suppose  that  flow  /  and  length  function  t  satisfy  Relaxed  Optimality  Condition 

A  53  t{vw)u{vw)  <  (y~)  53  ^{vw)f{vw). 

vwiE  vwiE 


Proof:  Let  A  denote  the  set  of  edges  for  which  (1  +  c)/(vu;)  >  A  •  u{vw).  We  can  estimate  the 
sum  by  summing  separately  over  the  sets  A  and  E/A,  i.e., 


A  53  fivw)u{vw)=X  53  f(vui)u(vu;)  +  A  53  f{vxv)u{vw). 

vuiiE  vw€A  VVI^E/A 

Now  we  bound  the  first  sum  using  the  first  part  of  Relaxed  Optimality  Condition  Rl  and 
the  second  sum  using  the  second  part  of  Relaxed  Optimality  Condition  Rl.  Thus 

A  53  fivw)u{vw)  <  (1  +  f)  53  f(vu>)/(rtu)  +  A  53  53  fivw)u{vw)j 

vw€E  Vive  a  vw€E/A  vw^E  / 

^  (l  +  f)  53  I  ”  53  f(vtn)u(r«j)  J 

^  (1  +  f)  53  +  Ac  53  f(vtn)u(vui). 

tiuiEA  vui€E 
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This  chain  of  inequalities  implies  that 


(1  -  ^  (1  +  0 

vwiE  vw€A 

<  (1 +0  51  ^(»«')/(vu')- 

«■>€£ 


We  can  now  bound  the  gap  between  the  last  two  terms  in  (2.6). 

Lemma  2.3.2  Suppose  that  flow  /  and  length  function  /  satisfy  Relaxed  Optimality  Conditions 


Rl  and  R2.  Then 


Proof:  We  can  bound  the  sum  C,  by  considering  the  contribution  from  €-good  and  c- 
bad  commodities  separately.  The  total  contribution  from  all  e-bad  commodities  is  bounded  in 
Relaxed  Optimality  Condition  2  by  The  contribution  from  each  e-good  commodity 

can  be  bounded  using  the  definition  of  an  e-good  commodity,  i.e., 

C.  <  (^<’(‘')  +  X  12  . 

Letting  B  represent  the  set  of  all  e-good  commodities  and  summing  over  both  e-good  and  e-bad 
commodities,  we  get  that: 


i  =  1  leB 


+ 

tgB 


<  (73—)  E  T  ^  ^(v«')«(w«')')  +  fE^'- 

i€B  \  *  vwiE  /  1  =  1 


Combining  like  terms,  we  obtain 


^  (i~^)  ^  (r^)  ^  ^  ^(vtx;)u(rtr). 
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Now  we  can  use  Lemma  2.3.1  to  bound  the  second  term  on  the  righthand  side.  This  yields 


(2.9) 

(2.10) 


Combining  like  terms, 


•=1  ^  *  (l-€)>  •  =  » 

A  simple  algebraic  calculation  shows  that  for  «  <  the  right  side  is  at  most  (l  +  5<)52f=j 


Theorem  2.3.3  Suppose  /,  t,  and  e  satisfy  the  relaxed  optimality  conditions  and  c  <  1/9.  1  hen 
/  is  0(c)*optimal,  and  in  particular,  A  <  (1  +  9e)A*. 

Proof:  Combining  Lemma  2.3.1  and  2.3.2, 

A  l(vwMvw)  < 
vw€£  ^  '  •=! 

Simple  algebra  and  Lemma  2.2.4  prove  the  theorem.  ■ 


The  Unit  Capacity  Case 

For  the  unit  capacity  case,  we  sometimes  benefit  from  using  the  optimality  conditions  of  Theo¬ 
rem  2.2.2.  We  can  develop  similar  conditions  for  relaxed  optimality  where  the  optimality  is  in 
terms  of  paths  rather  than  commodities.  The  development  is  similar  to  that  for  commodities, 
so  our  presentation  shaU  be  more  concise. 

Let  0  <  c  <  1/12  be  an  error  parameter,  /  a  multicommodity  flow  and  (  a  length  function. 
We  say  that  a  path  P  €  Vi  for  a  commodity  t  is  €-short  if 


A 

mm{D,kdi} 


/(t;tn)u(ru;). 


t{P)  -  distt{si,ti)  <  (£{P)  +  € 
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and  (-long  otherwise.  The  intuition  is  that  a  flow  path  is  e-short  if  it  is  short  in  either  a  relative 
or  an  absolute  sense,  i.e.,  it  is  either  almost  as  short  as  the  shortest  possible  (si,t,)-path  or  it  is 
at  most  a  small  fraction  of  l(vw)u(vw).  We  use  this  notion  in  defining  relaxed  optimality 

conditions  for  the  unit-capacity  case  (with  respect  to  a  flow  /,  a  length  function  I  and  an  error 
parameter  e).  The  new  relaxed  optimality  conditions  are  condition  Al  defined  above  and  the 
following  variant  of  condition  R2, 

T.  MPWP)  <('£'£ 

i=l  i=l  Pev, 

P€V, 

P  <-bAcl 

Relaxed  Optimality  Condition  R2'  says  that  the  amount  of  flow  that  is  on  c-long  paths 
contributes  a  small  fraction  of  the  sum  /  •  t. 

Lemma  2.3.2  bounds  the  gap  between  the  first  and  second  terms  in  (2.5).  We  now  proceed 
to  bound  the  gap  between  the  last  two  terms. 

Lemma  2.3.4  Suppose  /  and  t  and  (  satisfy  the  Relaxed  Optimality  Conditions  R\  and  R2'. 
Then  ^ 

E  E  <  (1  +8c)^d.st,(s,,f,)d,. 

i=i  pev,  «=i 

Proof:  We  break  the  sum,  parts;  the  sum  over  e-short  paths 

and  the  sum  over  e-long  paths.  Relaxed  optimality  condition  R2'  gives  us  an  upper  bound  of 
f  H.*=i  llp(V.  on  th®  sum  over  the  e-long  paths.  Taking  the  definition  of  an  e-short 

path  and  multiplying  both  sides  by  fi{P)  gives  us  the  following  bound  that  applies  for  e-short 
paths: 


t{pmp) 


distt{Si,ti)fi{P)  + 


- t(vxv)u(vw) 


)■ 


Let  5  denote  the  set  of  e-short  paths.  Summing  over  aU  e-short  paths  and  using  the  facts 
that  Yip(v.ns  MP)  <  T,P€V.  fiiP)  =  and  (min{I>,  <  !>“*  -I-  (kdj"*  we  get 
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Fts  mai{D,kdi}  J 

<  .  *i  )^i  +  7^  s  ( -.-Tj^vT  S  /(t>u>)u(»u;)] 

<  —^^disti(si,ti)di  +  Y^  S  (§  +  r)  ^  52 

'  i=l  *  i=l  '■  '  vweE 


Now  observe  that  there  are  exactly  k  commodities  and  52*_i  d,  =  O,  so  the  last  term  sums 
to  exactly  7^7  I2„,„6£:^(t;u>)u(t  u>).  Thus 

52  ^  -^^disttisi^ti)!,  +  7—“^  52  ^( ««’)«( vw’)- 


'vu>€£ 


Combining  the  bounds  on  the  sum  over  (-long  and  (-short  paths, 


i=i  Pev. 


i=i  PeP. 


Using  Lemma  2,3.1  to  bound  ^  J2vw€E  equation  5Iv»€E 

Ef=i  Epev.  we  obtain 

E  E  w/i(n<  yz^yttE  E  w/<(n+eE  E  w/i(n 


f=l  P€P. 


•=1  P€P. 


i=l  P€T>. 


Combining  like  terms  yields  the  equation 


E  E  WMP)  <- — T^E<''»W'>i.<04- 

»  =  1  P€P,  ^  ^  ~  ■(l-()3'^  •  =  ! 

Simple  algebra  shows  that  the  second  term  is  less  than  (1  -I-  8()53f_i  dtst/(Si,ti)<i,  if  (  <  1/12. 


2.4.  ALGORITHMS  FOR  THE  GENERAL  CONCURRENT  FLOW  PROBLEM 
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CONCURRENT(I,f) 

/  ♦-  Initialize(I). 
while  /  is  not  e-optimal 

/  DECONGEST(/,e). 
return  / 

Figure  2.5:  Algorithm  Concurrent 

Combining  the  previous  lemma  with  Lemma  2.3.1  yields  the  following  theorem. 

Theorem  2.3.5  Suppose  /  and  t  and  c  satisfy  the  Relaxed  Optimality  Conditions  R\  and  R2' 
and  e  <  1/12.  Then  /  is  e-optimal,  i.e..  A  is  at  most  a  factor  (1  +  12e)  more  than  the  minimum 
possible. 

Proof:  Combining  Lemma  2.3.1  with  Lemma  2.3.4  gives  that 
A  Yi  fivw)u(vw)< 

Simple  algebra  completes  the  proof.  ■ 

The  remainder  of  this  chapter  focuses  on  algorithms  that  achieve  the  various  relaxed  opti¬ 
mality  conditions. 


2.4  Algorithms  for  the  General  Concurrent  Flow  Problem 

In  this  section  we  give  an  algorithm,  Concurrent,  for  approximately  solving  the  general 
concurrent  flow  problem.  In  Section  2.4.1,  we  will  bound  the  time  needed  in  terms  of  the 
number  of  minimum-cost  flow  computations.  For  simplicity  of  presentation,  throughout  this 
section  we  shall  use  a  model  of  computation  that  allows  the  use  of  exact  arithmetic  on  real 
numbers  and  provides  exponentiation  as  a  single  step.  In  Section  2.4.2  we  will  show  how  to 
modify  our  algorithms  to  work  in  the  standard  RAM  model.  The  question  of  which  minimum- 
cost  flow  algorithm  to  use  is  deferred  to  Section  2.4.3,  in  which  we  show  how  several  different 
minimum-cost  flow  algorithms  can  be  used,  each  of  which  leads  to  a  different  running  time. 
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2.4.1  Solving  Concurrent  Flow  Problems 

In  this  section,  we  give  approximation  algorithms  for  the  general  concurrent  flow  problem. 
We  give  two  algorithms,  Concurrent  and  ScalingConcurrent.  The  former  is  the  basic 
algorithm  on  which  we  will  concentrate  on  for  most  of  this  section.  The  latter  is  an  algorithm 
that  employs  a  technique  which  we  call  c-scaling  and  is  best  when  (  =  o(l).  We  defer  our 
discussion  of  ScalingConcurrent  until  the  end  of  this  section. 

We  begin  with  a  high  level  description  of  our  main  algorithm  Concurrent,  which  appears 
in  Figure  2.5.  Algorithm  Concurrent  takes  as  input  an  instance  J  of  the  concurrent  flow 
problem  and  an  error  parameter  c,  c  <  5,  and  returns  an  9e-optimal  concurrent  flow.  The 
algorithm  first  calls  a  procedure  Initialize  which,  given  an  instance  of  the  concurrent  flow 
problem,  returns  a  2/:-optimal  flow.  The  remainder  of  the  algorithm  consists  of  a  sequence  of 
calls  to  a  procedure  called  Decongest.  Decongest  takes  as  input  a  flow  /  that  has  congestion 
Ao  and  an  error  parameter  e  and  returns  a  flow  which  is  either  9(-optimal  or  has  congestion  at 
most  Ao/2. 

We  begin  our  analysis  by  bounding  the  running  time  of  Concurrent  in  terms  of  the 
running  time  of  Initialize  and  Decongest. 

Lemma  2.4.1  Let  T/  =  T/(J)  be  the  running  time  of  procedure  Initialize,  a  procedure  that 
returns  a  2fr-optimal  flow.  Let  To  =  Tp{I)  be  the  running  time  of  procedure  Decongest,  a 
procedure  that  either  returns  a  9c-optimal  flow  or  decreases  A  by  a  factor  of  2.  Then,  given  an 
instance  J  of  a  concurrent  flow  problem,  algorithm  Concurrent  finds  an  e-optimal  solution  in 
0{Ti  +  Tp  log  k)  time. 

Proof:  We  first  call  procedure  Initialize  to  find  an  initial  solution  that  is  2k-optimal,  i.e.,  one 
for  which  A  <  2kA*.  Each  call  to  Decongest,  except  for  the  final  one,  decreases  A  by  a  factor 
of  2  and  we  continue  to  call  Decongest  until  A  <  (1  +  9c)A*.  Thus  the  number  of  iterations 
is  at  most  the  logarithm  of  the  ratio  of  the  initial  and  final  values,  or 
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lNlTlALlZE(J,f) 
for  t  =  1 . . . 

if  commodity  i  is  simple 

then 

Compute  Qi,  a  maximum  flow  from  Si  to  ti  in  instance  //  =  (G,u,Si,ti). 

(♦)  Mvw)  ^  gi(vw)  •  (d./l^il)  Vutn  6  E. 

else 

^low  *  myi  ^high  ^  nDf. 

while  (Ahigh  ~  Alow)  ^ 

Amid  *  (Ahigh  Aiow)/2. 

u'{vw)  *-  w(vtu)  •  Amid  Vvt/J  €  E. 

if  there  is  a  feasible  flow  in  instance  F'  =  (G,  u',d,) 

then 

Ahigh  ^  A^j^j 

else 

A|ow  Amid 

Let  A,'  *—  Ahigh- 

ti'(uto)  u(t;ti;)  •  A,  Vwtr  €  E. 

Let  fi  be  a  feasible  flow  in  instance  F  =  (G,u\di). 
return  /  _ 


Figure  2.6:  Procedure  Initialize 

In  the  remainder  of  this  section  we  shall  describe  how  to  implement  the  various  parts  of 
algorithm  Concurrent.  First,  we  will  describe  procedure  Initialize,  which  finds  a  “good” 
initial  solution  to  the  given  concurrent  flow  problem.  Then,  we  will  describe  procedure  Decon- 
GEST,  which  takes  a  flow  with  congestion  A  and  produces  a  new  flow  that  is  either  Qe-optimal 
or  has  congestion  at  most  A/2. 

Finding  an  Initial  Solution 

This  section  describes  procedure  Initialize,  which  takes  as  input  an  instance  of  the  concurrent 
flow  problem  I  and  outputs  a  flow  which  is  2^*optimal.  See  Figure  2.6.  The  main  idea  is  that 
we  separately  route  each  commodity  i  in  a  good  way.  The  algorithm  is  broken  into  two  cases. 
If  a  commodity  i  is  simple  then  we  find  a  maximum  flow  of  value  from  Sj  to  t,  and  then 
scale  the  flow  on  each  edge  by  di/|p,  |.  If  a  commodity  is  not  simple,  then  a  series  of  maximum 
flow  computations  must  be  performed.  We  perform  a  binary  search  over  the  range  of  possible 
values  of  A,  and  at  each  iteration  test  whether  there  exists  a  flow  with  congestion  A,-.  In  either 
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case,  the  flow  found  for  commodity  t  has  congestion  0(A*).  Combining  all  the  commodities 
yields  a  flow  with  congestion  O(kX'). 

Lemma  2.4.2  Let  Tmf  =  Tmf(M)  be  the  time  to  compute  a  maximum  flow  on  instance  M  = 
(G,u,s,t).  Then  procedure  Initialize  finds  a  2l;-optimal  multicomnwdity  flow  satisfying  demands 
in  0(klog(nU)Tur)  time.  Given  a  simple  multicommodity  flow  problem,  Initialize  finds  a  2k- 
optimal  multicommodity  flow  satisfying  demands  in  0(il:rMF)  time. 

Proof :  For  each  i  =  1, . . . ,  fc,  Initialize  finds  a  flow  /<  for  the  one-commodity  concurrent  flow 
problem  consisting  solely  of  commodity  t.  Let  be  the  congestion  of  flow  fi  and  let  A*  be  the 
minimum  possible  value  of  A,.  Clearly  for  each  i.  A*  <  A*.  Assume  that  Initialize  finds  a  flow 
with  A,  <  2A*  for  each  commodity  ».  Combining  the  flows  for  all  commodities  yields  a  flow 
with  congestion 

i=l  i=l  isl 

We  now  show,  that  for  each  t.  Initialize  actually  finds  such  a  flow. 

Consider  first  the  case  when  commodity  i  has  a  single  source  and  a  single  sink.  The  algorithm 
computes  p,,  a  maximum  flow  for  instance  M  =  (G,u,s,i).  Let  be  the  value  of  this  flow. 
Then  by  the  maximum-flow  minimum-cut  theorem  [29, 14],  there  exists  a  cut  with  total  capacity 
|5,  |.  It  is  easy  to  see  that  the  smallest  amount  by  which  we  can  multiply  the  capacity  of  the  cut 
and  have  a  flow  satisfying  demands  d,,  is  di/|5j|.  Therefore,  A*  =  d,/|5j|.  Further,  if  we  scale 
the  value  of  the  flow  on  each  edge  by  di/\gi\,  as  in  Line  (*),  we  now  have  a  flow  that  satisfies 
demands  and  has  congestion  A,  =  A*. 

Now  consider  the  case  when  a  commodity  is  not  simple.  A  single  maximum  flow  computation 
no  longer  suffices.  However,  for  a  given  value  of  A,,  say  A^id,  it  is  possible  to  check  whether 
there  is  a  feasible  flow  with  congestion  A^id .  To  do  so,  we  multiply  each  edge  capacity  by  Amid 
and  then  see  if  there  exists  a  feasible  flow  in  instance  T  —  (G,  u  •  Amid,<fi)-  This  computation 
can  be  carried  out  via  a  maximum  flow  computation  in  a  graph  with  G(n)  nodes  and  G(m) 
edges  (see,  for  example,  [38]).  To  find  a  good  value  of  Aj,  we  perform  binary  search  over  the 
range  of  possible  A^.  The  maximum  possible  value  of  A,  is  no  more  than  the  maximum  edge 
flow  divided  by  the  minimum  edge  capacity.  The  maximum  edge  flow  is  no  more  than  nU,, 
the  total  demand  for  commodity  i.  The  minimum  edge  capacity  is  1,  and  hence  Aj  <  nDi. 
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The  total  amount  of  flow  in  the  network  is  Di  and  hence  some  edge  must  have  at  least  Di/m 
flow.  The  capacities  are  bounded  by  U,  and  hence  the  minimum  possible  value  A,  attains  is 
Di/(mU).  Each  iteration  of  the  while  loop  halves  the  range,  a.^d  hence  in 

iterations  A^  is  within  Di/{mU)  of  A*.  When  we  stop  we  have  a  flow  with  congestion  at  most 
A*  +  Di/(mU)  <  2A*  of  optimal.  ■ 

Rerouting  Flow 

Now,  we  show  how,  given  a  flow,  we  can  iteratively  reroute  commodities  in  order  to  produce  a 
new  flow  that  is  closer  to  optimality.  We  give  a  procedure  Decongest  which  takes  a  flow  / 
with  congestion  Aq  and  produces  a  new  flow  that  is  either  9c-optimal  or  has  congestion  at  most 
Ao/2.  Decongest  consists  of  a  series  of  iterations  of  a  while  loop.  We  will  analyze  Decongest 
by  first  bounding  the  number  of  iterations  of  the  while  loop  and  then  bounding  the  time  for  one 
iteration  of  this  while  loop.  In  the  reminder  of  this  section,  when  we  use.  the  term  iteration 
we  refer  to  an  iteration  of  this  while  loop. 

Recall  that  a  commodity  is  called  e-bad  if  its  cost  is  too  high.  The  basic  idea  is  that 
each  iteration  of  Decongest  reroutes  an  appropriately  chosen  fraction  of  the  flow  of  an  c-bad 
commodity  onto  the  edges  of  a  minimum-cost  flow  associated  with  this  commodity  (as  described 
below),  in  order  to  reduce  congestion.  We  use  a  length  function  t{vw)  =  where 

the  value  of  a  will  be  specified  later.  This  length  function  has  the  property  that  the  length  of  an 
edge  vw  is  a  function  of  the  congestion,  i.e.,  the  fraction  (possibly  greater  than  1)  of  the  capacity 
of  that  edge  that  is  being  used.  Intuitively,  by  using  lengths  as  costs  in  the  computation  of  the 
minimum-cost  flow,  we  are  penalizing  edges  with  high  congestion. 

One  of  the  important  properties  of  this  particular  length  function  is  that  at  the  beginning  of 
procedure  Decongest,  we  can  choose  a  so  that  Relaxed  Optimality  Condition  R1  is  satisfied 
and  remains  satisfied  through  the  execution  of  procedure  Decongest.  The  act  of  rerouting 
flow  gradually  enforces  Relaxed  Optimality  Condition  R2.  When  both  conditions  are  satis¬ 
fied,  Theorem  2.3.5  can  be  used  to  infer  that  /  is  0(«)-optimal.  Alternatively,  DECONGEST 
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Decongest(/,<) 

Ao  <—  A;  o  <—  2(1  +  f)Ao“*f"'  ln(m€“*)  . 

while  A  >  Ao/2  and  we  have  not  detected  that  /  is  9<-optimal 

For  each  edge  vw,  t(vw)  *- 

(*)  Chooae  a  commodity  t  as  a  candidate  for  rerouting, 
if  commodity  t  is  c-bad 
then 

Formulate  an  auxiliary  minimum-cost  flow  ittstance  M  =  (G,  A  •  u,f,d,). 

Compute  /,*  a  minimum-cost  flow  for  M- 

For  all  vw  €  E,  fi{vw)  (1  —  <r)/i(vw)  +  af^{vw). 

return  / 


Figure  2.7:  Procedure  Decongest 

terminates  if  A  decreases  by  more  than  a  factor  of  2. 

More  formally,  procedure  Decongest  (see  Figure  2.7)  takes  as  input  a  multicommodity 
flow  /  with  congestion  Aq,  where  /  satisfies  the  demands,  and  an  error  parameter  €.  In  each 
iteration,  we  first  choose  an  e-bad  commodity  i  and  formulate  an  auxiliary  minimum-cost  flow 
instance  M  =  (G,  A  •  u,£,di).  The  demand  of  each  node  v  in  the  auxiliary  problem  is  equal 
to  di(v),  and  the  desired  flow  f*{vw)  is  constrained  to  be  between  -A  •  u(t7u>)  and  A  •  u{vw), 
where  A  is  the  current  congestion.  The  objective  is  to  minimize  C*(A)  =  YI,vw^e  ^(vu>)  1/,*(vu;)1. 
Given  an  optimal  solution  to  this  problem,  we  reroute  a  fraction  a  =  5^  of  the  flow  /<  onto  the 
edges  of  /.*  by  setting  fi{vw) (1  —  cr)/j(vu;)  +  (rf*{vw),  recompute  the  length  function,  and 
repeat.  Upon  termination,  Decongest  returns  an  improved  flow  /  that  is  either  9e-optimal 
or  has  congestion  A  <  Ao/2. 

An  example  of  an  iteration  of  Decongest  appears  in  Figures  2.8,  2.9,  and  2.10.  This 
example  may  help  provide  intuition  before  proceeding.  In  Figure  2.8  we  have  the  same  flow 
as  in  Figure  1.2.  In  order  to  highlight  the  main  ideas  without  using  large  numbers,  we  set  the 
values  of  a  and  a  somewhat  arbitrarily  in  this  example.  The  algorithm  needs  to  find  an  e-bad 
commodity.  In  order  to  do  so,  we  first  compute  the  cost  of  each  commodity.  This  calculation 
is  carried  out  in  the  bottom  of  the  figure.  Next,  we  need  to  compute  minimum-cost  flows 
for  each  commodity.  In  the  example  we  compute  a  minimum-cost  flow  only  for  commodity  3. 
The  minimum-cost  flow  problem  and  its  solution  are  presented  in  Figure  2.9.  We  see  that  the 
cost  of  the  minimum-cost  flow  for  commodity  3  is  much  less  than  the  cost  of  the  current  flow 
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1  Vi  Vj 
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2  vj  V4 

1 

3  V,  V5 
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¥W 

2 
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C2=  1(16/3) 
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C3»  2(2048) 2(2048) 
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Figure  2.8:  Flows,  edge  lengths  and  costs  of  the  current  flows 
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loince 


C3*=2(4)+2(4)+2(J)=^ 

Figure  2.9:  A  minimum-cost  flow  for  commodity  3 


1  Vj  Vj  1 

2  vj  V,  1 

3  V,  vj  2 


Cj  =  1(16) +1(16)  =  32 

C2=  1(16/3)  =  16/3 

C3*  1(32) +  1(32)  + 

1(16/3)  +  1(16)  +  1(16)  =  304/3 


Figure  2.10:  The  situation  after  rerouting 
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for  commodity  3.  We  could  verify  that  commodity  3  is  indeed,  e-bad,  and  hence  we  should 
reroute  it.  Figure  2.10  shows  the  result  of  rerouting  1  of  the  flow  of  commodity  3  onto  the 
edges  of  the  minimum-cost  flow  for  commodity  3.  We  have  also  shown  the  edge  lengths  and 
commodity  costs.  Observe  that  the  cost  of  commodity  3  has  decreased  significantly.  Also  note 
that  although  we  did  not  reroute  commodity  1,  its  cost  has  increased,  because  commodity  1  is 
using  some  of  the  same  edges  as  commodity  3-  This  example  demonstrates  why  multicommodity 
flow  problems  are  difficult:  rerouting  the  flow  of  one  commodity  in  a  better  way  may  result  in 
the  flow  of  another  commodity  being  routed  in  a  worse  way. 

We  now  proceed  with  the  analysis.  Recall  that  if  Relaxed  Optimality  Conditions  R1  and  R2 
are  satisfied,  then  the  current  flow  is  9e-optimal.  We  will  first  show  that  R1  is  always  satisfied. 
In  particular,  we  now  show  that  if  we  set  or  =  2(1  4-  e)Xo~'e~^  ln(mf~')  at  the  beginning  of  a 
call  to  Decongest,  then  Relaxed  Optimality  Condition  R1  is  satisfied  throughout  that  call. 

Lemma  2.4.3  If  /  is  a  multicommodity  flow  that  satisfies  demands  and  a  >  (1-|-€)A~*£“*  ln(mf~' 
then  /  and  length  function  £(vw)  =  satisfy  Relaxed  Optimality  Condition  Rl. 

Proof :  We  show  that  if  an  edge  v'w'  violates  the  first  part  of  Relaxed  Optimality  Condition 
Rl  then  it  must  satisfy  the  second  part.  H  other  words  if 

A  •  u(v'«;')  >  (1 -I- €)/(v'u?'),  (2.11) 


then 

u(v'w')£(v'w')  <  —  ^(wtn)u(vin). 

We  can  use  (2.11)  to  upper  bound  the  length  of  edge  v'w', 

£(v'w')  =  c‘‘^(‘''"’'>/"('’'’“'Vu(nV)  <  c“*/(‘+'>/u(v'tn')- 

Let  v'w’  be  an  edge  such  that  X(v’w’)  =  A  and  hence  u(v’w’)((v’w’)  =  c"*.  Then 

>  u(v^w’)£(v’w’)  ^  e°*  ^  oa(i-i/(i+o)  ^  goA{</(i+,)) 

u(v'w')£(v'w')  ~ 
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Now  if  we  plug  the  lower  bound  on  a  into  we  see  that 

gOA(</(l+0)  =  g3(l+0A~*f"‘lii(mt~*)A(*/(X+0)  > 

where  the  penultimate  inequality  follows  by  canceling  terms.  ■ 

Corollary  2.4.4  Relaxed  Optimality  Condition  Rl  is  always  satisfied  throughout  a  call  to  proce* 
dure  Decongest. 

Proof:  At  the  bepnning  of  procedure  Decongest,  a  is  set  equal  to  2(1  +  c)Ao”*e"*  In(mc"‘) 
and  throughout  Decongest  A  >  Ao/2  and  hence  A^ ^  >  A”  V2.  Therefore  a  >  (l+c)A~‘c"*  ln(mf~ 
throughout.  ■ 

We  have  just  seen  that  Relaxed  Optimality  Condition  Rl  is  always  satisfied.  By  the  con¬ 
trapositive  of  Theorem  2.3.3,  if  the  current  flow  is  not  (-optimal,  then  Relaxed  Optimality 
Condition  R2  must  be  violated.  The  key  to  showing  the  efficiency  of  our  algorithm  will  be 
to  show  that  when  R2  is  not  satisfied,  then  one  iteration  of  Decongest  makes  “progress." 
Although  the  overall  goal  of  our  algorithm  is  to  make  progress  by  decreasing  the  congestion 
A,  each  iteration  of  our  algorithm  need  not  actually  decrease  A.  In  order  to  measure  progress 
of  our  algorithm,  we  introduce  a  potential  function  ♦  =  ♦(u,/,/).  We  will  specify  a  set  of 
conditions  on  this  potential  function  that  will  suffice  to  derive  a  good  bound  on  the  number 
of  iterations  of  procedure  Decongest.  We  will  then  give  a  particular  potential  function  and 
show  that  it  meets  these  criterion. 

Notice  that  the  termination  condition  of  the  while  loop  is  “A  >  Ao/2  and  we  have  not 
detected  that  /  is  9f-optimal.”  It  is  trivial  to  detect  when  A  <  Ao/2,  and  hence  we  can  assume 
that  as  soon  as  A  <  Ao/2,  the  call  to  Decongest  terminates.  To  check  whether  /  is  9(-optimal 
is  not  as  easy  and  in  fact,  if  not  done  carefully,  can  dominate  the  running  time  of  Decongest. 
For  ease  of  presentation,  we  assume,  for  now,  that  as  soon  as  the  condition  of  the  while  loop  in 
Decongest  is  not  satisfied,  the  algorithm  detects  it.  In  other  words,  at  the  beginning  of  each 
iteration,  the  current  flow  /  is  not  9(-optimal.  In  particular,  since  Relaxed  Optimality  Condition 
Rl  is  always  satisfied,  at  the  beginning  of  each  iteration  Relaxed  Optimality  Condition  R2  is 
not  satisfied.  For  the  deterministic  algorithms,  at  the  beginning  of  each  iteration,  R2  actually 
is  not  satisfied  whereas  for  the  randomized  algorithms,  R2  may  be  satisfied  at  the  beginning  of 
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an  iteration.  In  either  case,  we  will  eventually  show  how  to  remove  this  assumption. 

Let  Aq  be  congestion  of  the  initial  flow  passed  to  Decongest.  Let  the  sequence 

of  flows  at  the  beginning  of  each  iteration  of  Decongest.  We  call  an  iteration  j  productive  if 
the  commodity  t  chosen  on  line  (*)  of  Decongest  is  c-bad  and  unproductive  otherwise. 

We  now  state  a  trivial  lemma  which  captures  the  different  factors  that  aflfect  the  running 
time  of  an  implementation  of  Decongest. 

Lemma  2.4.5  For  a  call  to  Decongest,  let  Ip  be  the  number  of  productive  iterations  and  lu 
be  the  number  of  unproductive  iterations.  Let  Tp  be  the  time  spent  in  one  productive  iteration, 
and  Tv  be  the  time  spent  in  one  unproductive  iteration.  Assume  that  the  procedure  terminates  as 
soon  as  /  is  9c-optimal.  Then  the  running  time  of  Decongest  is 

OilpTp  +  IvTv).  (2.12) 


In  most  cases,  the  dominant  term  is  IpTp,  the  time  spent  in  productive  iterations.  We 
proceed  to  bound  Ip  first. 

Let  . . . ,  be  the  values  that  potential  function  $  takes  on  during  successive  iterations. 
We  call  a  potential  function  ♦  useful  if  throughout  a  call  to  procedure  Decongest  it  satisfies 
the  following  four  conditions: 

Ul)  <  me"*", 

U2) 

U3)  If  iteration  j  is  unproductive  then  j  =  0, . .  .,p  -  1. 

U4)  If  iteration  j  is  productive  then  =  n(y$),  j  =  0, . .  .,p  -  1. 

Note  that  the  existence  of  a  useful  potential  function  actually  implies  something  about  the 
performance  of  Decongest. 

We  now  show  that  if  a  potential  function  is  useful,  then  we  can  establish  a  bound  on  the 
number  of  productive  iterations  on  the  while  loop  during  one  call  to  Decongest. 
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Lemma  2.4.6  Let  ^  be  a  useful  potential  function.  Then  procedure  Oecongest  terminates 
after  0(r~^il;log  n)  productive  iterations.  If  the  initial  flow  is  O(e)-optimal,  then  Decongest 
terminates  after  O(e~’)blogn)  productive  iterations. 

Proof :  First  we  bound  the  number  of  times  i  can  be  reduced  by  a  factor  of  e  throughout  a  caU 
to  Decongest.  Since  ♦  is  useful,  initially,  <  me“^®  amd  in  the  last  iteration 
By  U 3  and  {74,  ^  never  increases.  Thus  the  number  of  times  #  can  decrease  by  a  constant 
factor  is  just  the  logarithm  of  the  ratio  of  the  initial  and  final  values  of  #.  wl  h  is 

(^^aAo/2  \ 

j  =  0(aAo  +  log  m)  =  O(aAo). 

The  last  equality  follows  by  plugging  in  the  value  of  a  specified  in  Decongest. 

By  U4,  each  productive  iteration  results  in  a  reduction  in  ^  of  Since  1  —  i  <  c“*, 

it  follows  that  every  0(k(~^)  iterations  reduce  ♦  by  at  least  a  factor  of  e. 

Multiplying  the  number  of  productive  iterations  it  takes  to  reduce  by  a  factor  of  e  by  the 
number  of  times  ^  can  be  reduced  by  a  factor  of  e  in  order  to  decrease  by  a  constant  factor, 
we  see  that  DECONGEST  executes  0{aXok(~^)  productive  iterations.  Plugging  in  the  value  of 
Q,  we  get  that  the  number  of  productive  iterations  is 

ln(nr*)^  . 

We  have  assumed  that  c  is  at  least  inverse  polynomial  in  n,  and  we  maintain  that  A  >  Ao/2,  so 
the  number  of  productive  iterations  is  in  fact  C?(e”^l;logn). 

If  the  initial  flow  is  O(f)-optimal  then  we  know  that  A*  >  Ao/(l  +  0(f))i  so  throughout 
Decongest,  A  never  goes  below  Ao/(l  +  0(c)).  Thus,  we  have  the  tighter  bound  on  the 
possible  range  of  the  potential  function  of  <  mc“*®.  So  to  decrease  the 

potential  function  by  a  constant  factor  takes 

0  ('•« 

productive  iterations.  Continuing  as  above,  we  get  that  the  number  of  iterations  is  0((~^k  log  n). 


0(aXok(~^)  =  0(c“‘A  ‘  ln(nc"‘)Aoifcc“*)  =  0 


2.4.  ALGORITHMS  FOR  THE  GENERAL  CONCURRENT  FLOW  PROBLEM 


41 


We  use  the  particular  potential  function  ♦  =  u{vw)t{vxfi).  To  complete  the  proof 

that  Decongest  terminates  after  a  small  number  of  productive  iterations,  we  need  to  show 
that  i  is  useful.  We  begin  by  showing  that  it  satisfies  the  first  three  conditions  in  the  definition 
of  useful. 

Lemma  2.4.7  Let  $  =  u(vw)t(vw).  Then  throughout  orte  call  to  Decongest,  U1  and 

U2,  U3  are  satisfied. 

Proof:  Initially  for  each  edge  vw,  A(vt(;)  <  Aq,  thus 

and 

^  u(vw)l{vw)  <  ^  <  me®*®. 

VW^E  VW^E 

At  the  beginning  of  the  last  iteration,  Decongest  has  not  terminated,  and  therefore  A  >  Ao/2. 
Thus  here  must  be  at  least  one  edge  v'w'  for  which  X{v'w')  >  Ao/2.  Since  t(vw)  and  u(vw)  are 
always  non-negative,  u(vu;)£(vu))  is  non-negative  for  each  edge  vw  and  hence 

^  u{vw)({vw)  >  u{v'w')£{v'w')  >  e®*®^*. 

vw^E 

Finally,  if  iteration  j  is  non-productive,  then  commodity  i  is  not  c-bad  and  no  rerouting  takes 
place.  Hence,  neither  the  flow  nor  the  length  function  changes  and  ^  remains  unchanged.  ■ 
The  following  lemma  establishes  that  the  potential  function  ♦  =  u{vw)i{vw)  satisfies 

U4.  This  lemma  is  the  heart  of  the  analysis  of  Decongest. 

Lemma  2.4.8  Let  c  <  1  and  bet  j  be  a  productive  iteration  and  let  t 

be  an  e-bad  commodity,  and  let  //  be  a  minimum-cost  flow  for  this  commodity  as  computed  in 
Decongest.  Let  the  new  flow  for  commodity  t  be  defined  by  fi{vw)  *—  (1  —cT)fi{vw)  +  <Tfi{vw). 
Then. 

Proof:  Denote  by  i{vw)  and  ('{vw)  the  length  of  edge  vw  before  and  after  rerouting,  respec¬ 
tively.  Let  6(vw)  denote  the  increase  in  flow  on  vw  due  to  rerouting.  Recall  that,  after  rerout¬ 
ing,  the  flow  of  the  rerouted  commodity  i  on  vw  is  |(1  -  a)fi{vw)  af*{vw)\,  and  hence 
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|^(vit>)|  <  erlf*(vw)  —  /i(vti;)l  <  <T(|/j(t;«>)|  +  |/•(t>tiJ)l).  Moreover,  since  both  /<  loid  /,*  have 
congestion  at  most  A,  |^(vtr)|  <  2(rAu(vw). 

By  definition  of  the  length  function, 

where  rj  =  o^(t»ti;)/u(tJti>).  Observe  that  Ji;!  <  2a<rA  <  f/4  <  1/4.  Using  the  Taylor  series,  we 
see  that  |f7|  <  f/4  <1/4  implies  that  for  all  e*"*"*’  <  e*  +  tfe*  +  Therefore,  we  have: 


i'(vw)  < 
< 


i(vw)  +  Tlt(vw)  +  ^\T}\t(vw) 

u(vw)  ^  ^  2u(vw)  '  ^ 


We  use  this  bound  to  give  a  lower  bound  on  the  decrease  in  the  potential  function. 


^  ^  (I(vw)  —  t'{vw))u{vw) 

viv€£ 

>  a<^5Z(l/<(v«^)l  -  l/i*(v«^)IK(v«^)  -  53(l/i(w«')l  +  l/i*(vw)|)f(ru;). 

vw 

By  the  definitions  of  C<  and  C,*(A),  C,  =  and  C,*{A)  =  fti^w)  lf(t;u;)l, 

hence  we  can  rewrite  the  last  bound  as 

^  >  a<T(C<  -  c;(A))  -  q<t|(C<  +  c;(A)). 

Since  C,*(A)  <  C*,  we  can  bound  the  last  term  by  aaeCi,  We  can  use  the  definition  of  an  c-bad 
commodity  to  establish  a  lower  bound  on  the  C,-  —  C,*{A)  that  appears  in  the  first  term  on  the 
righthand  side.  Plugging  in  we  get 


^  _  ^+1  >  aa{Ci  -  C'*(A))  -  aatCi  >  aa  ^(Ci  +  fA 


f(vtn)u(vti;) 


aoiCi  —  — 


(2.13) 
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Plugging  in  the  value  of  a  from  the  statement  of  the  lemma,  we.  get  that 


> 


aaeX 


> 


ae*A 


IGaAJI; 


Combining  Lemmas  2.4.6,  2.4.7,  and  2.4.8  we  get  the  following  lemma: 

Lemma  2.4.9  Assume  that  as  soon  as  /  is  9e-optimal,  Decongest  terminates.  Then  Ip  = 
O((~^klogn)  if  the  initial  flow  is  arbitrary  and  0(c~^iklogn)  if  the  initial  flow  is  0(()-optimal. 

This  bound  on  Ip  holds  for  both  the  randomized  and  the  deterministic  version  of  Decon¬ 
gest. 

Implementations  of  Decongest 

We  now  give  two  different  implementations  of  Decongest,  a  deterministic  implementation 
and  a  more  efficient  randomized  one.  For  each  one,  we  will  explain  the  algorithm  and  then 
bound  lu,  Tp,  and  Tv.  We  will  also  discuss  the  issue  of  detecting  when  /  is  9€-optimal.  For  all 
variations,  the  only  computation-intensive  part  of  an  iteration  of  Decongest,  be  it  productive 
or  unproductive,  is  finding  an  (-bad  commodity  and  computing  minimum-cost  flows.  All  the 
rest  can  be  done  in  0(m)  time.  Thus,  we  will  concentrate  on  finding  an  (-bad  commodity 
and  computing  minimum-coct  flows.  Throughout  this  section,  we  treat  the  minimum-cost  flow 
subroutine  as  a  black  box,  and  shall  discuss  its  implementation  later. 

Before  beginning,  we  discuss  how  to  perform  a  termination  check  in  all  variants  of  the 
algorithm. 

Lemma  2.4.10  Let  Tmcf  =  Tmcf(A( )  be  the  time  to  compute  a  minimum  cost  flow  for  instance 
M.  Then  given  a  flow  /,  we  can  determine  if  /  is  9(-optimal  in  0(1:(Tmcf))  time. 

Proof:  In  order  to  detect  termination,  we  can  compare  A  to 

({vw)u{vw)), 

•=1  vtvEE 


(2.14) 
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the  lower  bound  given  by  Lemma  2.2.4.  The  numerator  of  (2.14)  can  be  computed  by  computing 
k  minimum-cost  flows.  The  denominator  can  be  computed  in  0(m)  time.  ■ 

Now  we  describe  the  straightforward  deterministic  implementation.  The  simplest  way  to 
find  an  e-bad  commodity  is  to  compute  the  costs  Ci  =  |/((ow)|  and  the  costs  of  the 

minimum- cost  flows  C*  and  compare  them  to  see  if  commodity  t  is  e-bad.  In  the  worst  case  we 
need  to  check  all  k  commodities.  Computing  the  cost  of  one  commodity  takes  0(m)  time,  and 
therefore  the  costs  of  all  commodities  can  be  computed  in  0(km)  time.  Hence,  an  iteration  can 
be  implemented  in  the  time  it  takes  to  perform  k  minimum-cost  flow  computations  plus  0(km) 
additional  time.  After  each  iteration,  we  can  perform  a  termination  check.  By  Lemma  2.4.10, 
the  time  for  a  termination  check  is  the  same  as  the  time  for  an  iteration,  and  hence  including 
the  time  spent  performing  termination  checks  does  not  increase  the  asymptotic  running  time. 
Further,  since  we  perform  a  termination  check  after  each  iteration,  we  claim  that  every  iteration 
is  productive.  At  the  start  of  an  iteration,  the  flow  is  not  9e-optimal,  and  by  Lemma  2.4.3, 
Relaxed  Optimality  Condition  R1  is  always  satisfied.  Thus  by  the  contrapositive  of  Theorem 
2.3.3,  Relaxed  Optimality  Condition  R2  must  not  hold  and  there  must  be  an  (-bad  commodity. 
Since  in  each  iteration,  we  check  each  commodity  to  see  if  it  is  (-bad,  if  one  exists,  we  will  find 
it. 

We  summarize  this  discussion  in  the  following  lemma: 

Lemma  2.4.11  Let  Tmcf  =  Tmcf(A4)  be  the  time  to  compute  a  minimum  cost  flow  for  instance 
M.  Procedure  Decongest  can  be  implemented  to  run  in  0(("*k*logn(rMCF))  If  the  initial 

flow  is  0(()-optimal,  Decongest  can  be  implemented  to  run  in  0(c“’k*  log  n(TMCF))  time. 

Proof:  By  the  above  discussion,  Tp  =  0{kTj4cr),  lu  =  0,  and  as  soon  as  /  is  9(-optimal, 
Decongest  terminates.  Plugging  these  bounds  and  the  bov.nds  of  Lemma  2.4.6  into  equation 
(2.12)  yields  the  lemma.  ■ 

Deterministically,  it  seems  necessary  to  know  the  values  of  the  k  minimum-cost  flows  in  each 
iteration.  However,  by  using  a  simple  randomized  strategy,  we  can  show  that  it  is  necessary 
to  compute  only  expected  0(("‘)  minimum-cost  flows  in  each  iteration.  When  ("*  =  o{k),  for 
example  when  (  is  a  fixed  constant,  this  randomized  strategy  leads  to  faster  algorithms. 


We  begin  by  giving  the  strategy: 
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Randomized  Strategy  1:  When  choosing  a  commodity  to  r<>routp,  choose  a  commodity 
with  probability  proportional  to  its  cost,  i.e.,  Pr[commodity  i  is  chosen]  =  Cj/C,  whe»‘e  C  - 

zUiCj- 

Lemma  2.4.12  Suppose  /  is  not  9e-optimal  and  a  comnwdity  i  is  chosen  at  random  using 
Randomized  Strategy  1.  Then  Pr[t  is  c-bad]  >  e.  If  we  implement  Decongest  using  Randomized 
Strategy  1  then  E[Iu]  =  0((~^lp),  and  we  can  amortize  the  running  times  so  that  Tu  =  0(T'mcf  + 
*:)  and  Tp  =  0{TMcr  +  mk). 

Proof:  Let  C  =  Cj.  If  the  flow  is  not  9c-optimal,  then  it  must  be  the  case  that  R2  is 
violated,  i.e., 

Y,  Cj  >  eC.  (2.15) 

j  C'bad 

We  choose  a  commodity  t  with  probability  proportional  to  its  cost,  i.e.,  with  probability  CijC. 
The  probability  that  a  commodity  chosen  in  this  manner  is  bad  is  just 

Pr[i  is  f-bad]  =  ^  Cj/C  >  cC/C  =  e, 

j  <-b«d 

where  the  inequality  follows  from  (2.15).  Thus  with  probability  at  least  t,  we  have  chosen  am 
c-bad  commodity. 

In  each  iteration,  the  probability  of  finding  an  c-bad  commodity  is  at  least  t,  thus  in  ex¬ 
pected  0{e~^)  iterations,  we  will  find  such  a  commodity.  The  iteration  in  which  we  find  an 
f-bad  commodity  is  productive  while  the  rest  are  unproductive,  so  i^[/i/]  =  0(f“^/p).  Consider 
a  sequence  of  unproductive  iterations  followed  by  one  productive  iteration.  At  the  beginning 
of  this  sequence,  we  compute,  in  0{km)  time,  the  vaJues  C,  for  i  =  l,...,ifc.  We  then  exe¬ 
cute  a  sequence  of  unproductive  iterations.  Each  involves  choosing  a  commodity,  which  can 
be  performed  in  0{k)  time,  and  then  computing  one  minimum-cost  flow.  Thus  each  one  of 
these  iterations  takes  0(7mcf  +  k)  time.  Finally,  we  actually  execute  a  productive  iteration  in 
0(Tmcf  +  k)  time.  We  can  charge  the  computation  of  the  commodity  costs  to  this  productive 
iteration,  so  Tp  =  0{mk  -1-  Tmcf)-  ■ 


Corollary  2.4.13  Let  Tmcf  =  TMCF(Af)  be  the  time  to  compute  a  minirtium  cost  flow  for 
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instance  M  and  assume  that  Decongest  terminates  as  soon  as  /  is  9(-optimal.  Then  Procedure 
Decongest  can  be  implemented  to  run  in  expected  0(c“®fc log n(f“^rMCF  +  f~'ik  +  »nib))  time.  If 
the  initial  flow  is  0(<)*optimal,  in  can  be  implemented  to  run  in  expected  0(c~*Jblogn(f"‘rMCF  + 
€~^k  +  mk))  time. 

Proof:  Plug  the  values  from  Lemma  2.4.12  and  2.4.6  into  (2.12).  ■ 

Observe  that  if  k  <  n  (this  is  the  case  when  Lemma  2.2.1  is  applied)  the  time  to  compute 
the  cost  of  all  current  flows  is  dominated  by  the  time  to  compute  a  minimu  ro't  flow,  and 
the  mk  and  terms  disappear  from  the  bound  in  Corollary  2.4.13.  On  the  other  hana,  if  k  is 
large,  then  the  dominant  step  may  be  computing  the  costs  and  choosing  a  commodity.  In  this 
case,  we  can  reduce  the  time  by  using  a  somewhat  more  involved  strategy: 

Randomized  Strategy  2:  Pick  an  edge  with  probability  proportional  to  the  product  of 
the  length  of  the  edge  and  the  flow  through  this  edge.  Let  the  chosen  edge  be  vw.  Then 
choose  a  commodity  with  probability  proportional  its  flow  on  edge  vw.  In  other  words,  let 
F  =  I^vweE  Choose  an  edge  vw  ^  E  with  Pr[vu;  is  chosen]  =  l{vw)f{vw)/F. 

Choose  a  commodity  i  with  Pr[t  is  chosen]  =  \fi{vw)\/ f{vw). 

We  now  show  that  this  strategy  still  chooses  commodity  t  with  probability  proportional  to 
its  cost  and  reduces  the  time  for  random  selection  from  0{km)  to  the  minimum  of  0(m  +  fc)  aind 
0(TnlogA:).  By  doing  so,  we  are  actually  picking  a  commodity  with  probability  proportional  to 
its  cost,  without  ever  explicitly  computing  these  costs. 

Lemma  2.4.14  Suppose  a  commodity  i  is  chosen  according  to  Randomized  Strategy  2.  Then 
Pr[i  is  c-bad]  >  (.  Assume  that  we  implement  Decongest  using  Randomized  Strategy  2  and 
terminate  as  soon  as  /  is  Qf-optimal.  Then,  for  this  strategy  E[Iv]  =  0(f~'/p). 
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Proof :  Let  F  =  SIviuce  f(vw)t(vw).  The  event  that  i  is  chosen  is  just  the  sum  of  m  independent 
events,  one  for  each  edge.  Thus 


Pr[t  is  chosen]  = 


Pr[tfw  is  chosen]  Pr[t  is  chosen  |  vtv  is  chosen] 

«(«€£ 


^  i{vw)f(vw)  |/.(t>u>)| 


E|/i(tytp)|/(t?ti>) 

F 

vwiE 


But  YIvuice  F  =  Ilf=i  choosing  a  commodity  with 

exactly  the  same  probability  as  in  Randomized  Strategy  1.  The  lemma  foUows.  ■ 

We  now  give  two  ways  to  implement  Randomized  Strategy  2.  The  first  is  the  straightforward 
one  in  which  we  pick  an  edge  and  then  pick  a  commodity  going  through  that  edge.  In  each 
iteration,  we  first  choose  a  random  number  x  6  [0, 1]  and  then  find  the  smallest  t  for  which 


t  >  X.  (2.16) 

vwsl 

This  procedure  gives  us  an  edge  with  the  right  probability.  Given  the  value  of  ai_i,  a,  can  be 
computed  in  0(1)  time,  and  thus  can  be  computed  in  0(m)  time.  Analogously,  we 

can  choose  a  commodity  using  this  edge  vw  by  choosing  a  random  number  y  6  [0, 1]  and  then 
finding  the  sm2dlest  t  for  which 


_  ^  \Mvw)\ 
’  fivrv) 


>  y- 


(2.17) 


The  6i,  t  =  1, . .  .,Jt  can  be  computed  in  0{k)  time.  Once  we  have  chosen  a  commodity,  a 
minimum-cost  flow  is  computed  in  O(T'mcf)  time.  If  it  turns  out  that  the  commodity  is  c-bad, 
it  is  rerouted.  Thus  we  have  shown: 


Lemma  2.4.15  An  iteration  of  Decongest  can  be  implemented  using  Randomized  Strategy  2 
so  that  Tu  =  Tp  =  0(Tmcf  +  k  +  m)  =  0(Tmcf  +  ^)- 


This  yields  the  following  corollary: 
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Corollary  2.4.16  Let  Tmcf  =  3mcf(A4)  be  the  time  to  compute  a  minimum  cost  flow  for 
instance  Afand  assume  that  Decongest  terminates  as  soon  as  /  is  9(*optimal.  Then  Procedure 
Decongest  can  be  implemented  to  run  in  expected  0{€~*k  log  n{TMCT  +  m  +  ib))  time.  If  the 
initial  flow  is  O(c)-optimal,  in  can  be  implemented  to  run  in  expected  0(c~’blogn(TMCF  +  m  +  ib)) 
time. 

Proof:  Plug  the  values  from  Lemmas  2.4.15  and  2.4.6  into  (2.12).  ■ 

Alternatively,  we  can  use  slightly  more  involved  data  structures  and  derive  bounds  that 
have  depend  on  logib  rather  than  k. 

Lemma  2.4.17  An  iteration  of  Decongest  can  be  implemented  using  Randomized  Strategy  2 
so  that  7t/  =  0(Tmcf  +  logA:  +  m)  =  0(Tmcf  +logA:)  and  Tp  =  0(Tmcf  +  mlogk). 

Proof:  In  the  previous  strategy,  at  each  iteration,  given  a  random  value  y,  we  had  to  check 
(2.17)  for  all  k  commodities.  To  perform  this  computation  more  efficiently,  we  can  store  the 
values  \fi{vw)\/ f{vw)  in  a  balanced  binary  tree.  There  is  one  tree  for  each  edge  vw.  In  tree  vtn, 
leaf  t,  »  =  1, . .  .,k  contains  the  value  \fi{vw)\/f{vw).  Each  internal  node  of  the  tree  contmns 
the  sum  of  the  leaf  values  in  its  subtree.  It’s  well  known  that  given  such  a  data  structure, 
the  leftmost  leaf  satisfying  (2.17)  can  be  found  in  0{logk)  time.  In  order  to  maintain  these 
data  structures,  we  must  update  the  appropriate  trees  each  time  a  flow  value  changes.  The 
only  changes  occur  when  flow  is  rerouted  and  each  rerouting  only  changes  the  value  of  the  flow 
for  one  commodity  on  at  most  m  edges.  Therefore,  the  updates  associated  with  one  routing 
step  can  be  accomplished  in  O(mlogk)  time.  Each  unproductive  iteration  performs  one  tree 
search  and  one  minimum-cost  flow,  while  each  productive  iteration  performs  one  tree  search, 
one  minimum-cost  flow  and  one  rerouting.  The  lemma  follows.  ■ 

We  can  summarize  this  in  the  following  corollary: 

Corollary  2.4.18  Let  Tmcf  be  the  time  to  compute  a  minimum-cost  flow  and  assume  that  De¬ 
congest  terminates  as  soon  as  /  is  Qc-optimal.  Then  Procedure  Decongest  can  be  implemented 
to  run  in  expected  C>(f"'^^log  n(TMCF  +  mlog/:))  time.  If  the  initial  flow  is  O(e)-optimal,  in  can 
be  implemented  to  run  in  expected  0(f~®A:logn(TMCF  +  mlogib))  time. 


Proof:  Plug  the  values  from  Lemmas  2.4.17  and  2.4.6  into  (2.12).  ■ 
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In  the  analysis  above,  we  have  assumed  that  the  algorithm  terminates  as  soon  as  the  flow  is 
9c-optimal.  In  order  to  guarantee  this  condition,  we  would  have  to  perform  a  check  after  each 
iteration.  However,  if  we  did  perform  a  check  after  each  iteration,  we  would  spend  more  time 
performing  checks  than  rerouting  flow.  If  the  termination  checks  are  not  to  dominate,  we  need 
a  strategy  that  checks  for  termination  only  1  out  of  every  k  iterations.  To  do  so,  after  each 
iteration,  with  probability  l/k,  we  perform  an  termination  check. 

Lemma  2.4.19  Assume  that,  with  probability  1/k,  after  each  iteration  of  Decongest  we  check 
whether  /  is  9r-optimal.  Then  the  time  bounds  given  in  Corollaries  2.4.13  and  2.4.18  hold  without 
any  assumptions  about  termination. 

Proof:  We  divide  the  iterations  of  the  algorithm  into  two  sets,  those  in  which  the  flow  is 
9f-optimal  and  those  in  which  it  is  not. 

First  we  focus  on  the  case  when  the  flow  is  not  9e*optimal.  From  Lemmas  2.4.15  and 
2.4.17,  we  see  that  in  all  cases,  if  the  current  flow  /  is  not  9€-optimal,  then  one  iteration  of 
Decongest  takes  D(7mcf)  time.  We  now  add  to  these  iterations,  a  termination  check,  with 
probability  1/k.  By  Lemma  2.4.10,  a  termination  check  takes  0{kTMcr)  time,  and  therefore 
the  expected  running  time  increases  by  at  most  a  constant  factor. 

If  /  is  9e-optimal,  then  the  termination  check  recognizes  it.  However,  since  we  only  check 
with  probability  1/k,  we  expect  to  execute  0(k)  iterations  in  which  the  flow  is  9c-optima] 
but  the  termination  check  does  not  recognize  it.  But  for  both  Randomized  Strategy  1  and 
Randomized  Strategy  2,  Ip  =  U{k)  and  /y  =  Sl{k).  Therefore  adding  k  additional  iterations 
does  not  increase  the  asymptotic  running  time.  ■ 

We  now  summarize  the  main  results  of  this  section. 

Theorem  2.4.20  Let  Tmcf  =  2mcf(A<)  be  the  time  to  compute  a  minimum  cost  flow  for 
instance  M.  Then  assuming  that  exponentiation  can  be  done  in  0(1)  time,  the  following  table 
gives  the  times  for  procedure  Decongest: 
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initial  flow  arbitrary 

initial  flow  O(e)*optimal 

Randomized 

0((~*klogn(TMCF  +  mlogl:)) 

0(e-*k  log  n(TMCF  +  m  +  k}) 

0((-^k  log  n(rMCF  +  m  log  *)) 

0(e"*l:logn(rMCF  +  m  +  k)) 

Deterministic 

0((-^k>  log  tiTmcf) 

0(€-^k^  log  tiTmcf) 

Proof:  This  table  summarizes  Corollary  2.4.16  and  2.4.18  and  Lemma  2.4.11.  ■ 

Putting  it  all  together:  A  Summary  of  Algorithm  Concurrent 

We  can  now  give  running  times  for  algorithm  Concurrent.  We  will  give  two  sets  of  running 
times,  one  when  the  input  is  a  simple  A:-commodity  concurrent  flow  problem,  and  one  when  the 
input  is  a  non-simple  A:*-commodity  concurrent  flow  problem.  Given  an  instance  of  a  simple 
A:-commodity  concurrent  flow  problem,  we  have  two  options.  One  option  is  to  use  the  bounds 
for  the  simple  concurrent  flow  problem.  Alternatively,  we  can  use  Lemma  2.2.1  to  create  a 
Jt*-commodity  non-simple  instance,  use  the  time  bounds  for  the  non-simple  concurrent  flow 
problem,  and  then  decompose  the  solution.  This  procedure  takes  0(l:’mlogn  -f  kn)  time  plus 
the  time  to  solve  a  non-simple  Jb’-commodity  concurrent  flow  problem.  Although  this  second 
method  may  lead  to  faster  running  times,  throughout  the  rest  of  the  chapter,  we  do  not  carry 
this  calculation  through.  Also,  we  use  the  randomized  running  times  given  in  the  first  line  of 
the  chart  in  Theorem  2.4.20  for  simple  instances  and  the  second  line  for  non-simple  instances. 
Even  though  for  some  simple  instances,  the  second  line  of  randomized  running  times  given  in 
the  chart  in  Theorem  2.4.20  gives  faster  running  times,  do  not  carry  through  this  calculation. 

Theorem  2.4.21  Let  Tmcf  =  Tmcf(A4)  be  the  time  to  compute  a  minimum  cost  flow  for 
instance  M.  Then  assuming  that  exponentiation  can  be  performed  in  0(1)  time,  the  following 
table  gives  the  running  times  for  Algorithm  CONCURRENT: 


simple  instance 

non-simple  instance 

Randomized 

0(c“'*lrlogtlogn(rMCF  -1-  mlogi)) 

O(k’nmlog  log(nl/) 

logi*  logn(rMCF  +  m)) 

Deterministic 

0(e-^k^  log  k  log  n(TMCF  +  log  i)) 

Oifnmiog  log(nt/) 

-f(€~®t'^logi*logn(TMCF  +  »T»))) 
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Proof:  Combine  Lemma  2.4.1,  2.4.2,  Corollary  2.4.16,  Corollary  2.4.18  and  the  fact  that  a 
maximum  flow  in  an  n-node  m-edge  graph  cam  be  computed  in  0(nmlog(Ti’/m))  time  [22]. 
Note  that  only  in  the  non-simple  case  does  the  time  for  initialization  appear  in  the  final  time 
bounds.  ■ 

A  scaling  algorithm 

The  dependence  on  e  given  in  Theorem  2.4.21  can  be  reduced  somewhat,  through  a  technique 
we  call  (-scaling.  Instead  of  calling  Decongest  with  the  value  of  c  given  in  the  input,  we  call  it 
with  a  series  of  values  of  c,  each  ^  of  the  previous  value.  The  advantage  of  scaling  is  that  at  the 
beginning  of  each  call  to  Decongest  the  initial  flow  is  0(f)-optimaI.  Thus  we  can  employ  the 
bounds  on  Decongest  from  the  second  column  for  the  table  in  Theorem  2.4.20  to  obtain  faster 
running  times.  The  e  scaling  we  use  is  similar  to  that  used  by  Goldberg  and  Tarjan  in  their 
minimum-cost  flow  algorithm[21].  The  details  of  our  scaling  algorithm,  ScalingConcurrent, 
appear  in  Figure  2.11. 

Lemma  2.4.22  Let  rc(()  =  7c(€, I)  be  the  running  time  of  Concurrent,  on  input  (e, I).  Let 
To  =  To{I,()  be  the  running  time  of  procedure  DECONGEST  given  an  O(f)-optimal  input  flow. 
Then,  given  an  instance  J  of  a  concurrent  flow  problem,  algorithm  ScalingConcurrent  finds 
an  (-optimal  solution  in  0{Tc{^)  -I-  Td{())  time. 

Proof : 

First  we  find  an  1-optimal  multicommodity  flow  using  algorithm  Concurrent,  with  (  = 
1/9.  The  rest  of  the  computation  is  divided  into  scaling  phases.  We  start  each  phase  by  dividing 
(  by  2.  Thus  our  current  flow  is  18(-optimal  with  respect  to  the  new  (.  The  bounds  given  in  the 
second  column  of  Theorem  2.4.20  imply  that  the  running  time  of  Decongest,  given  an  0(()- 
optimal  solution  is  proportional  to  where  c  is  2  or  3,  depending  on  whether  the  algorithm 
is  randomized  or  deterministic.  Since  in  each  subsequent  call  to  Decongest,  (  decreases  by  a 
factor  of  2,  the  the  running  times  form  a  geometric  series  in  ("'.  Hence  the  running  time  for 
the  series  is  dominated  by  twice  the  time  for  the  last  iteration.  ■ 

Goldberg  [20]  and  Grigoriadis  and  Khachiyan  [26]  have  shown  how  to  reduce  the  running 
time  of  our  randomized  algorithms  by  an  (~*  factor.  Goldberg  gives  a  somewhat  simplified 
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ScalingConcurren’t(1,  f) 

/  CONCURRENT(I,|)- 
^  =  1 

while  (("  >  <) 
e'  ^  c'/2. 

/  ^  DECONGEST(/,e'). 

return  / 


Figure  2.11:  Algorithm  Concurrent 


version  of  our  proof  that  leads  to  a  randomized  selection  strategy  that  avoids  Having  to  search 
for  an  c-bad  commodity.  Grigoriadis  and  Khachiyan  generalize  our  algorithm  to  solve  certmn 
types  of  convex  programming  problems.  Their  algorithm,  when  specialized  to  the  case  of  solving 
multicommodity  flows,  also  avoids  searching  for  an  c-bad  commodity. 

We  now  summarize  the  results  for  Algorithm  ScalingConcurrent  so  far: 

Theorem  2.4.23  Let  Tmcf  =  Tmcf(AI)  be  the  time  to  compute  a  minimum  cost  flow  for 
instance  M.  Then  the  following  table  gives  the  running  times  of  Algorithm  ScalingConcurrent, 
assuming  exponentiation  can  be  implemented  in  0(1)  time: 


simple  instance 

non-simple  instzmce 

Randomized 

0(({~^  logt)fclogn(rMCF  +  mlogt)) 

0(fc*nm log  log(n{7) 

4-  log  logn(TMCF  +  m)) 

Deterministic 

-h  logi:)l:^logn7MCF) 

0(**nm log  log(nC/) 

-l-((e“*  -flogf)!:*’  log  n(rMCF  +  m))) 

Proof:  Combine  Theorem  2.4.21,  Corollary  2.4.16,  Corollary  2.4.18,  Lemma  2.4.22,  Theorem 
2.4.20  and  the  fact  that  a  maximum  flow  in  an  n-node  m-edge  graph  can  be  computed  in 
0{nm  log(n^/m))  time  [22].  Note  that  only  in  the  non-simple  case  does  the  time  for  initialization 
appear  in  the  flnal  bound.  ■ 

Given  an  instance  of  a  concurrent  flow  problem,  the  running  time  for  Algorithm  Scal¬ 
ingConcurrent  is  never  greater  than  that  of  Algorithm  Concurrent.  For  the  rest  of  this 
chapter,  we  will  quote  the  running  times  for  algorithm  ScalingConcurrent. 
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2.4.2  Dealing  with  Exponentiation 

In  the  previous  section,  we  assumed  a  non-standard  model  of  computation  in  which  exponen¬ 
tiation  takes  0(1)  time.  In  this  section,  we  show  how  to  implement  an  iteration  of  procedure 
Decongest  in  the  standard  RAM  model  of  computation,  achieving  the  same  time  bounds.  Our 
approach  is  to  show  that  even  if  we  require  that  all  variables  be  represented  using  0(\og{nU)) 
bits,  we  can  still,  in  each  iteration,  achieve  the  same  decrease  in  9,  up  to  constant  factors. 

More  specifically,  we  first  show  that  a  flow  that  satisfies  a  relaxed  set  of  minimum-cost  flow 
constraints  suffices  to  achieve  a  decrease  in  ^  of  We  then  show  that  a  flow  satisfying  a 

second  set  of  relaxed  constraints  can  be  modified  in  0(m)  time  to  satisfy  the  first  set  of  relaxed 
constraints  while  having  the  additional  property  that  the  resulting  flow  can  be  represented  in 
0(log(nl7))  bits  per  commodity /edge  pair.  We  then  give  an  approximate  length  function  that 
uses  0(log(nf/))  bits  per  edge  which  can  be  used  in  a  minimum-cost  flow  algorithm  to  produce 
a  flow  that  satisfies  the  second  set  of  relaxed  constraints. 

Each  iteration  of  procedure  Decongest,  as  described  in  Figure  2.7,  iteratively  computes 
//,  which  is  a  flow  that  satisfies  the  demands  of  commodity  t  subject  to  capacity  constraints 
Xu(vw)  on  each  edge  vw,  and  minimizes  C*  =  Instead,  we  compute  an 

approximation  f*  to  f’.  The  flow  f*  can  have  cost  somewhat  more  than  the  cost  of  /*,  and 
it  may  satisfy  slightly  relaxed  capacity  constraints.  The  key  to  showing  that  this  flow  can  be 
used  in  the  algorithm  instead  of  f*  is  to  prove  a  relaxed  version  of  Lemma  2.4.8. 


Lemma  2.4.24  Let  Ci  denote  the  cost  of  the  current  flow  of  commodity  t  with  respect  to 
the  current  length  function,  and  let  be  a  flow  that  satisfies  demands  of  commodity  i  and  the 
constraints 


fi(vw)  < 


2Xu{vw)  \/vw  €  E 


(2.18) 


Then,  if  we  use  f’  instead  of  /*  in  Decongest  with  55^  <  bound  the 

decrease  of  the  potential  function  by  n(y$). 


Proof :  The  difference  between  this  proof  and  that  of  Lemma  2.4.8  is  as  follows.  Here  we  can 
conclude  that  |6(vin)|  <  iaXu{vw),  and  |t;|  <  3o<tA  <  3c/16.  We  use  that  |q|  <  3c/16  <1/4 
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implies  c*'*’’’  <  e*  +  ije*  +  Igjiyle*.  We  modify  equation  (2.13)  appropriately,  and  conclude  that 

In  fact,  we  do  not  find  such  a  flow  directly.  Instead,  we  compute  a  flow  that  satisfies  the 
somewhat  tighter  constraints, 

f:(vw)  < 

< 

We  then  modify  this  flow  slightly  so  that  it  satisfies 
represented  by  0(\og(nU))  bits  per  edge. 

Lemma  2.4.25  Let  /*  be  a  flow  that  satisfies  conditions  (2.19).  Then,  in  0(m)  tinne,  we  can 
convert  it  into  a  flow  /  that  satisfies  (2.18)  and  such  that  (1  —  9)/,(vw)  +  aj’{vw)  can  be 
represented  in  C)(log(nf/))  bits  per  edge. 

Proof:  Given  the  flow  /*,  we  first  compute  the  flow  (1  -  o)fi  +  af*  where  o  is  chosen  as  in 
Lemma  2.4.24.  We  then  round  the  flow  on  edge  vw  to  an  integer  multiple  of  v  =  c*/(128m*fco). 
The  maximum  possible  flow  value  is  XU.  Dividing  the  range  of  flow  values  by  i/  and  substituting 
for  a,  we  see  that  the  number  of  possible  flow  values  is 

XU  _  256XUm^k(l -t- e)lit(m(~^) 

(^/(128m^ka)  At® 

The  number  of  bits  needed  is  just  the  logarithm  of  the  number  of  possible  values.  Recall  that 
€  is  inverse  polynomial  in  n,  1;  is  polynomial  in  n  and  m  =  0(n*),  thus  a  flow  value  can  be 
represented  in  0(hg(nU))  bits.  Observe  that  if  we  just  rounded  the  flow  on  each  edge  vw  to  the 
nearest  integer  multiple  of  i/,  we  would  have  no  guarantee  that  the  flow  conservation  constraints 
of  equation  (2.4)  or  (2.1)-  (2.3)  are  still  satisfied.  Thus,  we  must  round  more  carefully.  Let  T 
be  a  spanning  tree  in  the  graph.  We  round  the  flow  on  all  the  non-tree  edges  to  the  nearest 
multiple  of  i/.  This  rounded  flow  does  not  necessarily  satisfy  the  conservation  constraints,  so 
we  use  the  tree  edges  to  correct  for  the  violations  we  may  have  introduced.  It  is  easy  to  see 
that  by  computing  the  flow  values  on  the  edges  of  T  in  postorder  we  can  carry  out  this  step  in 
0(m)  time.  Observe  that  the  amount  of  flow  we  had  to  add  to  any  non-tree  edge  is  at  most  v 


|Au(vtn)  Yvw  €  E 


(2.19) 


C;  +  }(eC,  +  ,i4). 

conditions  (2.18)  and  the  new  flow  can  be 


2.4.  ALGORITHMS  FOR  THE  GENERAL  CONCURRENT  FLOW  PROBLEM 


55 


and  the  amount  that  we  had  to  add  to  any  tree  edge  is  at  most  mv,  since  the  flow  on  a  tree 
edge  may  have  to  correct  for  the  violation  across  the  cut  defined  by  deleting  that  edge  in  the 
tree. 

The  resulting  rounded  flow  implicitly  defines  a  as  it  can  be  written  as  (1  —  <7)/^  +  <rfi 
for  an  appropriately  chosen  /*.  The  flow  f*  on  edge  vw  is  f*{vw)  plus  <r“‘  times  the  rounding 
error  on  the  edge.  We  now  show  that  it  satisfies  the  conditions  (2.18).  The  rounding  error  on 
any  edge  is  at  most  mi/,  and  therefore  for  each  edge,  /‘(vw)  <  fi(vw)  +  Plugging 

in  the  bounds  on  fi{vw)  from  (2.19)  and  the  values  of  a  and  1/,  we  get  an  upper  bound  of 
|Ait(t;if;)+  lA.  Since  u(vtt;)  is  integral,  we  conclude  that  J*{vw)  <  2Xu{vw).  We  bound  the 
cost  of  f*  as  follows. 


<  (Evu,6£  |/i*(»«^)|)  +  m(<r-‘mi/c“*) 

<  c;  +  i  {(Ci  +  +  (by  (2.19)  and  the 

definitions  of  <t  and  i/) 

^  Q  +  2  +  ^)  (using  ♦  >  c“*). 


Therefore  we  have  satisfied  the  conditions  of  the  theorem.  ■ 

Combining  the  previous  two  theorems  we  get  the  following  corollary: 


Corollary  2.4.26  Let  /'  be  a  flow  satisfying  equations  (2.19).  Let  f'  be  the  flow  obtained  from 
f'  via  the  procedure  described  in  Lemma  2.4.25  Then  if  we  use  f‘  instead  of  /,*  in  Decongest 

32^  <  <  16^  >  can  bound  the  decrease  of  the  potential  function  during  one  iteration  of 

2 

Decongest  by  fl(y$),  while  maintaining  flow,  /epresented  by  0(log(nf7))  bits  per  edge.  ■ 


Now  we  show  how  to  compute  a  flow  that  satisfies  (2.19).  We  could  do  so  by  finding  a 
minimum-cost  flow  with  respect  to  the  exact  length  function  £.  Unfortunately,  this  length 
function  is  exponential  in  the  size  of  the  input,  and  computing  it  exactly  might  take  too  long. 
Instead,  we  will  describe  how  to  compute  an  approximate  length  function  I,  such  that  the  flow 
that  has  minimum  cost  with  respect  to  I  has  cost  at  most  C*  -I-  cA^/(8k)  with  respect  to  t. 
By  Corollary  2.4.26,  such  a  flow  can  be  used  in  order  to  implement  the  rerouting  step  in  our 
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algorithm. 

The  new  length  function  I  is  integral,  consists  of  0(log(nI7))  bits  per  edge,  it  is  approxi¬ 
mately  related  to  i  by  the  scalar  multiplier  7  =  €e*'^/(16t/mife),  and  it  satisfies  ’)i{vw)  <  l{yw) 
on  each  edge  vw.  We  shall  show  that  it  takes  O(logn)  time  to  compute  l{vw)  for  each  edge 
vw.  In  the  following  we  will  use  C,-  and  C,*  to  denote  the  current  cost  and  the  Tninimiim  cost 
of  commodity  i  with  respect  to  length  /,  respectively. 

For  each  edge,  first  we  compute  approximately  so  that  it  has  at  most  ^  = 

f/(16hm)  additive  error,  then  we  multiply  the  result  by  divide  by  «(»«>),  take  the  integer 
part,  and  set  i(vw)  to  be  this  value.  Using  the  Taylor  series  we  can  compute  one  bit  of  e'  in  0(1) 
time.  Since  is  at  most  1  on  each  edge,  it  is  sufficient  to  compute  0(log(l/C)) 

bits  to  achieve  the  desired  approximation.  Computing  the  approximate  length  function  takes 
0(log(l/C))  =  O(logn)  time  for  each  edge,  and  0(m  log  n)  time  in  total. 

Because  of  the  approximation  and  the  integer  rounding,  a  flow  fT ,  which  has  minimum  cost 
with  respect  to  /,  is  not  necessarily  the  minimum-cost  flow  with  respect  to  i.  We  will  show, 
however,  that  a  flow  that  is  minimum-cost  with  respect  to  £  satisfies  conditions  (2.19). 

Lemma  2.4.27  Let  /*  be  a  flow  that  is  minimum  cost  with  respect  to  the  costs  I  defined  above. 
Then  f*  has  cost  (with  respect  to  i)  at  most  (X^/{Sk)  more  than  the  actual  minimum-cost  flow 
with  respect  to  £. 

Proof:  Recall  that  7  =  and  C  =  f/(16mfc).  We  bound  the  difference  between  t  and  7/, 

a  scaled  up  version  of  the  approximate  length  function.  In  computing  7^,  we  introduce  errors 
in  two  places.  First,  when  computing  to  a  precision  of  we  introduce  an  error  of  (. 

This  error  gets  scaled  up  by  C,~^U /u{vw)  when  we  scale  up  and  gets  increased  by  1  when  we 
round  £  down  to  an  integer.  Finally,  if  we  scale  £  back  to  be  compatible  with  f,  the  whole  error 
gets  scaled  by  7.  Thus, 

t(vw)  -  iHvw)  <  ,  (c  (^)  +  1)  =  7  +  1)  .  (2.20) 

We  defined  £  so  that  ffivw)  <  £{vw)  on  each  edge,  and  hence  we  have  that 


7C*<c;. 


(2.21) 
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Using  these  two  equations  and  the  fact  that  ^  we  get  that: 

< 

< 

< 

< 

■ 

Notice  that  this  flow  actually  satisfles  slightly  stronger  conditions  than  (2.19).  We  shall  use 
this  stronger  condition  in  Subsection  2.4.3. 

In  the  randomized  implementation,  where  we  used  the  cost  of  the  current  flow  C,-  for  the 
selection  of  a  bad  commodity  t,  we  shall  now  use  the  rounded  cost  Cj  instead.  One  can  show 
that  the  rounding  error  is  small  relative  to  therefore  using  Ci  does  not  significantly 

decrease  the  probability  that  a  bad  commodity  is  selected. 

To  summarize,  we  have  just  described  how  to  implement  Decongest  in  the  RAM  model  of 
computation.  We  first  compute  an  approximation  £  to  the  length  function  t.  Then  we  compute 
the  approximate  cost  of  each  commodity  and  choose,  either  randomly  or  deterministically,  a 
commodity  to  reroute.  Next,  we  compute  an  approximate  minimum-cost  flow  for  that  commod¬ 
ity  with  respect  to  the  costs  £.  This  process  gives  us  an  approximate  minimum-cost  flow  that 
satisfies  equations  (2.19).  We  then  update  the  flows  for  commodity  t.  Finally,  we  modify  the 
updated  flow  as  described  in  Lemma  2.4.25,  represent  it  in  O(log(nl/))  bits  per  edge,  and  start 
the  next  iteration.  As  the  above  discussion  shows,  the  time  to  perform  the  whole  computation 
is  0(m log n)  plus  the  time  to  compute  a  minimum-cost  flow. 

We  can  now  update  Theorem  2.4.23  to  remove  the  assumption  that  exponentiation  takes 
0(1)  time. 


-  7C;  (by  (2.21)) 

l/*(»ti»)|  (/(vin)  -  7^’(»«>)) 

+  l)  (by  (2.20)) 

Att(«i£;)i^  (^) 

16tm 

li± 

8i  • 
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Theorem  2.4.28  Let  7mcf  =  Tmcf(M)  be  the  time  to  compute  a  minimum  cost  flow  for 
instance  M.  Then  the  following  table  gives  the  times  to  find  an  (-optimal  solution  to  the  concurrent 
flow  problem  in  the  RAM  model: 


simple  instance 

non-simple  instance 

Randomized 

+  logi:)ilogn(TMCF  -b  mlogt)) 

O(fc*nmlog  log(n£/) 

■f(c"®  +  logt*)4*  logn(TMCF  +  m)) 

Deterministic 

0((f-"-»-logi)i2lognTMCF) 

0(ib*nmlog  log(nC/l 

+  logfc*)i:*^  *  logn(TMC.  ■^))) 

2.4.3  Implementing  One  Iteration 

In  this  subsection  we  consider  the  problem  of  choosing  the  appropriate  minimum-cost  flow 
routine  to  find  a  minimum-cost  flow  subject  to  the  costs  £(vw).  In  some  cases  we  only  compute 
an  appro3dmate  minimum-cost  flow  subject  to  cost  t  by  further  rounding  the  costs  before  the 
minimum-cost  flow  computation.  In  all  cases,  however,  we  find  a  flow  that  satisfies  (2.19). 

Different  situations  require  different  choices.  For  general  concurrent  flow  problems,  the 
best  choice  seems  to  be  either  the  algorithm  of  Goldberg  and  Tarjan  [22]  or  that  of  Ahuja, 
Goldberg,  Orlin  and  Tarjan  [2].  For  concurrent  flow  with  uniform  capacity,  we  use  Gabow  and 
Tarjan ’s  [18]  algorithm  for  the  assignment  problem.  When  both  the  demands  and  capacities  are 
uniform,  we  use  the  algorithm  that  iteratively  computes  shortest  paths  in  the  residual  graph 
with  nonnegative  costs  discovered  independently  by  Ford  and  Fulkerson  [29]  and  Yakovleva  [71]. 
First,  we  consider  the  general  concurrent  flow  problem. 

Lemma  2.4.29  For  a  commodity  i,  a  mini. mum-cost  flow  with  respect  to  I  can  be  found  in 
O(nmlog(n^/m)  log(nt/))  time. 

Proof:  The  Goldberg- Tarjan  minimum-cost  flow  algorithm  runs  in  O(nmlog(n^/m)  log( nC)) 
time,  where  C  is  the  maximum  edge  cost,  assuming  that  the  costs  are  integral.  Recall  that  we  are 
only  using  0(logTif/)  bits  to  represent  the  integral  edge  costs,  and  hence  log(nC)  =  0{\ognU). 

■ 

The  above  bound  can  be  improved  if  the  capacities  are  small  relative  to  n^/m.  In  this  case 
we  round  the  demands  and  solve  this  rounded  problem  using  the  double  scaling  algorithm  of 
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Ahuja,  Goldberg,  Orlin,  and  Tarjan  [2].  We  then  satisfy  the  remaining  flow  on  arbitrary  paths. 
This  flow  still  satisfles  (2.19)  and  the  rounding  allows  us  to  use  a  faster  algorithm.  We  will 
prove  the  following  lemma: 

Lemma  2.4.30  For  a  commodity  t,  a  flow  satisfying  (2.19)  can  be  found  in 
0(nmlog(nf7)loglog(n(/))  time. 

Proof:  Assume  without  loss  of  generality  that  is  an  integer  and  define  ft  =  Ae/(16nl;).  We 
round  the  demands  for  commodity  i  to  integer  multiples  of  fi  such  that  the  absolute  value  of 
each  demand  does  not  increase,  the  rounded  demands  still  sum  to  zero,  and  the  total  decrease  in 
the  absolute  values  of  the  demands  is  at  most  2n/i.  (Recall  that  each  node  may  have  a  positive 
or  a  negative  demand.)  To  achieve  these  constraints,  we  round  all  but  one  of  the  demands  to 
the  next  smallest  multiple  of  /i  if  the  demand  is  positive  and  to  the  next  largest  multiple  of  /i 
if  the  demand  is  negative.  The  last  demand  can  then  be  rounded  by  at  most  to  ensure  that 
the  rounded  demands  still  sum  to  0. 

Since  the  absolute  value  of  the  demand  for  commodity  i  has  not  increased  at  any  node, 
there  must  exist  a  flow  satisfying  these  demands  with  cost  at  most  C*,  subject  to  costs  t. 

Both  the  demands  and  the  capacities  are  integral  multiples  of  fi.  If  we  divide  both  the 
demands  and  the  capacities  by  fi,  we  get  a  problem  where  the  maximum  capacity  of  an  edge  is 
W Ifi  =  16t/nA:(~'.  We  can  then  use  the  double  scaling  algorithm  of  Ahuja,  Goldberg,  Orlin 
and  Tarjan  [2]  for  solving  the  minimum-cost  problem  with  rounded  demands.  This  algorithm 
takes  0{nm\og{nC)\og\og{nU'))  time  on  a  graph  with  maximum  capacity  {/'.  Plugging  in  the 
value  of  C  from  Lemma  2.4.29  and  U'  =  16Unk€~^  yields  the  time  bound.  By  Lemma  2.4.27, 
this  procedure  gives  a  flow  that  satisfies  the  capacity  constraints  Xu{vw)  and  has  cost  at  most 
eX^/(8k)  more  than  the  minimum  cost  but  does  not  satisfy  the  demands.  We  then  satisfy  the 
remaining  demands  by  arbitrary  paths  from  nodes  with  excess  to  nodes  with  deficit.  The  last 
step  increases  the  flow  on  an  edge  by  no  more  than  2nfi  =  (X/{8k)  <  (Xu(vw)/(8k),  and  adds 
a  total  of  no  more  than  2n/i  ^(vtv)  <  Ac^/(8Ir)  to  the  cost  of  the  flow  subject  to  costs  £. 

Combining  the  minimum-cost  flow  with  the  flows  on  the  additional  paths,  we  get  a  flow 
that  satisfies  (2.19)  and  proves  the  lemma.  ■ 

In  the  case  of  the  simple  concurrent  flow  problem  we  can  make  the  time  required  for  solving 
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the  minimum-cost  flow  problem  independent  of  U. 

Lemma  2.4.31  For  the  simple  concurrent  flow  problem,  a  flow  of  a  commodity  t  satisfying  (2.19) 
can  be  ^ound  in  the  minimum  of  O(nmlognlog(n^/m))  and  0(nm  log  n  log  log  n)  time. 

Proof:  We  reduce  dj  by  a  factor  of  (1  —  c/8).  We  then  find  a  flow  //  that  satisfies  the  reduced 
demand  =  (1  —  c/8)di  and  whose  cost  with  respect  to  i  is  no  more  than 
cA  ^(vttf)u(vw)/(16k)  above  the  minimum  cost.  Then,  we  multiply  the  flow  on  each  edge 

by  (1  —  c/8)”^  This  process  gives  a  flow  that  satisfies  demands,  obeys  the  slightly  increased 
capacity  constraints  (1  —  c/8)“^A  •  u(vw),  and  has  cost  (subject  to  i)  at  most  cCj/4  -1-  cA$/(4)b) 
above  C*,  where  $  is  the  current  potential  function  value.  By  Lemma  2.4.25,  we  can  use  this 
flow  and  still  get  the  same  asymptotic  improvement  in  the  potential  function. 

Define  /x'  =  cdi/(8m),  and  round  the  capacities  Xu(vw)  used  for  the  minimum-cost  flow 
problem,  down  to  multiples  of  /x'.  One  can  show  that  the  minimum-cost  flow  with  respect  to  i 
that  satisfies  the  decreased  demand  d-  and  rounded  capacity,  is  no  more  than  C,^ 

The  demand  and  capacities  in  this  rounded  problem  are  integer  multiples  of  Therefore, 
there  exists  a  minimum-cost  flow  where  the  flow  on  the  edges  is  multiple  of  fi'.  This  flow  docs 
not  use  edges  whose  cost  is  more  than  CilyL\  and  hence  these  edges  can  be  deleted  for  the 
minimum-cost  flow  computation. 

For  getting  the  approximate  minimum-cost  flow  we  can  work  with  a  further  rounded  length 
function.  We  take  i{vw)  to  be  the  integer  part  of  dii{vw)f{XU).  Since  after  the  capacity 
rounding  we  consider  only  edges  with  Xu{vw)  >  /x',  we  have 

£{vtv)  < 


Therefore  the  Goldberg- Tar jan  minimum-cost  flow  algorithm  runs  in  O(nmlog(n^/m)logn) 
time  on  this  problem. 

Now  we  show  that  the  resulting  flow,  after  multiplication  by  (1  -  f/8)“‘,  satisfies  (2.19). 
The  minimum-cost  flow  has  a  single  source  and  a  single  sink  and  non-negative  costs.  Therefore, 


l6(~^kmU  di 
XU 

16€~*fcmdi 

? 

0{€~^km^). 
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no  edge  carries  more  than  units  of  flow.  Let  be  a  minimum-cost  flow  with  respect  to  t. 
By  an  argument  similar  to  the  proof  of  Lemma  2.4.27,  the  cost  of  this  flow  with  respect  to  I  is 
at  most  mdi  •  XU/di  •  ce“*/(16fcmf/)  <  €A^/(16A:)  larger  than  the  cost  of  fl  with  respect  to  t, 
where  fl  is  the  minimum-cost  flow  with  respect  to  t  that  satisfies  the  reduced  demand  d|.  Now 
Lemma  2.4.27  implies  that  (2.19)  is  satisfied. 

For  all  but  very  dense  graphs,  the  double  scaling  algorithm  of  Ahuja,  Goldberg,  Orlin  and 
Tarjan  [2]  gives  a  better  bound.  As  we  observed  no  edge  carries  more  than  df^  units  of  flow  in  the 
optimal  flow  of  commodity  t.  Thus,  we  can  also  limit  capacities  to  be  no  more  than  d[.  That 
is,  we  can  set  u'{vw)  =  min  |  fi\d'^.  With  this  modification,  the  largest  capacity  is  at 

most  dj  =  The  demand  and  the  capacities  are  multiples  of  //'.  Dividing  through 

by  the  scale  factor  fi'  yields  a  problem  with  integral  capacities  using  0(log  n)  bits.  ■ 

Combining  Theorem  2.4.28  and  Lemmas  2.4.2,  2.4.29,  2.4.30  and  2.4.31  we  obtain  the 
following  theorem  that  summarizes  the  results  for  the  general  case: 


Theorem  2.4.32  For  c  >  0,  algorithm  ScalingConcurrent  finds  an  c-optimal  solution  for 
the  simple  concurrent  flow  problem  in  the  following  time  bounds; 


simple  instance 

non-simple  instance 

Randomized 

0  ((€-3-(-logt)fcnmlog^nlog^^^j 

0  ((f“^  -1-  logfc)tnmlog^nloglogn) 

0  -J-  logfc*)t*nmIognlog(nC/)log 

0  ((f“*  +  logifc*)t*nmlognlog(nC/)loglog(nC/)) 

Deterministic 

0  ((f-2  -1-  log  k)k^nm  log*  n  log  ) 

0  ((f“*  -f  log  t)it^nmlog*  nloglogn) 

0  ((f“*  •+  logf)i:**nmlognlog(nC/)log 

0  ((f~*  -1-  logfc’)ir**nmlognlog(nt/)loglog(nf/)) 

2.5  The  Unit  Capacity  Case 

An  important  special  case  of  the  concurrent  flow  problem  occurs  when  the  edge  capacities  are 
all  1.  One  way  to  solve  this  problem  is  to  use  the  algorithm  for  the  general  case,  but  with 
a  minimum-cost  flow  algorithm  more  suited  to  graphs  with  unit  capacities.  We  will  discuss 
this  approach  in  Section  2.5.1.  Sections  2.5.2  through  2.5.5  develop  and  use  a  framework  for 
solving  uniform  capacity  concurrent  flow  problems  via  the  solution  of  a  series  of  shortest  path 
problems.  As  much  of  the  work  needed  to  develop  such  algorithms  is  identical  to  that  done  in 
Section  2.4,  we  omit  some  of  the  details.  The  algorithms  are  sufficiently  different,  however,  to 
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merit  a  fairly  involved  presentation.  For  the  unit  capacity  case,  we  only  deal  with  the  simple 
concurrent  flow  problem.  It  is  possible  to  modify  the  results  of  this  section  to  pve  algorithms 
for  the  non-simple  concurrent  flow  problem,  put  we  do  not  pursue  that  here. 

2.5.1  Using  the  Results  for  the  General  Case 

One  approach  to  solving  the  unit  capacity  case  is  to  use  the  algorithm  for  the  general  case.  For 
efliciency,  we  mod'^y  the  minimum-cost  flow  algorithm  that  we  use.  This  subsection  discusses 
this  approach. 

If  the  capacities  in  the  concurrent  flow  problem  are  uniform,  then  the  capacities  in  each 
minimum-cost  flow  problem  are  all  equal  to  A.  In  this  case,  there  are  more  efficient  minimum- 
cost  flow  algorithms  than  the  ones  mentioned  in  the  previous  section. 

Lemma  2.5.1  For  the  simple  concurrent  flow  problem  with  uniform  capacities,  a  flow  for  a 
commodity  t  satisfying  (2.19)  can  be  found  in  log  n)  time. 

Proof :  Since  the  concurrent  flow  problem  is  simple  and  has  unit  capsicity,  the  auxiliary  minimum- 
cost  flow  problem  has  one  source,  one  sink  and  edge  capacities  all  equal  to  A.  The  minimum-cost 
flow  problem  is  guaranteed  to  have  a  feasible  solution.  Therefore,  it  must  be  the  case  that  the 
demand  dj  <  mA,  the  total  capacity  in  the  networks.  Let 

d\  = 

< 

< 


and 


1  Vutn  €  E. 


We  solve  this  rounded  problem  and  then  route  the  remaining  flow  on  a  single  path.  By  ar¬ 
guments  similar  to  those  in  Lemma  2.4.30,  this  procedure  yields  a  flow  that  satisfies  (2.19)  . 
To  solve  the  minimum-cost  flow  problem  with  capacities  u'  and  demands  d'^,  we  can  use  an 
algorithm  of  Gabow  and  Tarjan  [18].  Gabow  and  Tarjan  show  how  to  modify  their  scaling 
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algorithm  for  the  assignment  problem  to  find  a  solution  to  a  single-source  single-sink  minimum- 
cost  flow  problem  with  m  edges,  all  edge  capacities  equal  to  1,  and  demand  at  most  d',,  in 
0((m  -I-  d'j)®/’log(nd'i))  time.  Plug^ng  in  the  bound  d!i  <  m,  we  obtain  the  time  bound 
claimed  in  the  statement  of  the  lemma.  ■ 

When  both  the  capacities  and  demands  are  uniform  and  k  is  relatively  large,  we  can  obtain 
better  performance  by  using  the  Ford  and  Fulkerson  [29]  minimum-cost  flow  algorithm,  which 
iteratively  computes  shortest  paths  in  the  graph  of  residual  edges  with  nonnegative  costs.  Since 
each  capacity  is  an  integer  multiple  of  A  and  the  lengths  are  non-negative,  a  minimum-cost 
flow  of  demand  d  can  be  computed  by  fd/A]  shortest  path  computations  in  networks  with 
nonnegative  edge  lengths. 

The  demands  are  also  uniformly  equal  to  1,  thus  A  >  k/m,  since  there  are  at  least  k 
units  of  flow  divided  between  m  edges.  Therefore,  in  this  case,  the  number  of  shortest  path 
computations  required  for  finding  a  minimum-cost  flow  is  at  most  0{m/k  +  1).  Each  shortest 
path  can  be  computed  in  0(m  -f-  nlogn)  time  [17].  Thus  we  have  shown: 

Lemma  2.5.2  For  the  simple  concurrent  flow  problem  with  uniform  capacities  and  demands,  a 
flow  for  a  commodity  t  satisfying  (2.19)  can  be  found  in  O  (y(m  -1-  nlogn))  time. 

By  incorporating  Lemma  2.5.2  or  Lemma  2.5.1  into  Theorem  2.4.32,  one  can  derive  running 
time  bounds  for  algorithm  ScALlNcCoNCURRENTfor  the  special  cases  when  the  capacities  are 
uniform  and  when  both  the  capacities  and  demands  are  uniform.  Observe  that  when  both  the 
capacities  and  demands  are  uniform,  the  time  to  find  a  minimum-cost  flow,  O  (^(m  -I-  nlog  n)), 
may  be  less  than  ©(mlogk),  the  time  to  select  a  commodity.  Because  of  this  case  alone,  the 
bounds  in  Theorem  2.4.20  contain  the  extra  O(mlogk)  term. 

Theorem  2.5.3  For  e  >  0,  an  (-optimal  solution  for  the  simple  concurrent  flow  problem 

•  with  uniform  capacities  can  be  found  in  expected  -I- log  k)km®/*log*  n)  time  and  in 

+  logk)l:^Tn^/^log^n)  time  deterministically, 

•  with  uniform  capacities  and  demands  can  be  found  in  expected 

-I-  log f:)mlogTi(m  -f  nlogn  -|-  fclogk))  time  and  in  -I-  logk)kmlogn(m  -|- 

nlogn  -f  klog/:))  deterministically. 
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2.5.2  Solving  Unit  Capacity  Concurrent  Flow  Problems 

In  this  section,  we  rederive  algorithms  for  the  special  case  of  the  concurrent  flow  problem  with 
unit  capacities.  These  are  based  on  treating  flow  as  a  collection  of  paths,  rather  than  as  a  set 
of  commodities  and  lead  to  improved  running  times  for  some  cases.  In  this  section,  we  describe 
the  procedure  Reduce  that  is  the  core  of  our  approximation  algorithms,  and  prove  bounds  on 
its  running  time.  Given  a  multicommodity  flow  /,  procedure  Reduce  modifies  /  until  either  / 
becomes  c-optimal  or  A  is  reduced  below  a  given  target  value.  The  approximation  algorithms 
presented  in  the  next  two  sections  repeatedly  call  procedure  Reduce  to  decrease  A  a  factor 
of  2,  until  an  (-optimal  solution  is  found. 

As  before,  for  simplicity  of  presentation,  we  shall  assume  for  now  that  the  value  of  the  length 
function  /(vw)  =  of  edge  vw  can  be  computed  in  one  step  from  f{vw) 

and  represented  in  a  single  computer  word.  In  Section  2.5.4  we  will  remove  this  assumption 
and  show  that  it  is  sufficient  to  compute  an  approximate  length  function,  and  show  that  an 
approximate  length  function  can  be  computed  quickly. 

Procedure  Reduce  (see  Figure  2.12)  is  the  analog  of  procedure  Decongest.  It  takes  as 
input  a  multicommodity  flow  /,  a  target  value  r,  an  error  parameter  e,  and  a  flow  quantum 
Oi  for  each  commodity  i.  We  require  that  eaich  flow  path  comprising  /<  carries  flow  that  is  an 
integer  multiple  of  £t,-.  The  procedure  repeatedly  reroutes  a,  units  of  flow  from  an  c-long  path  of 
commodity  i  to  a  shortest  path  for  commodity  i.  We  will  need  a  technical  granularity  condition 
that  Oi  is  small  enough  for  each  i  to  guarantee  that  approximate  optimality  is  achievable  through 
such  reroutings.  In  particular,  we  assume  that  when  Reduce  is  called,  we  have 


<7,  <  r 


31og(mf-*) 


i  =  1,. 


(2.23) 


Upon  termination,  the  procedure  outputs  an  improved  multicommodity  flow  /  such  that 
either  A  is  less  than  the  target  value  r  or  /  is  e-optimal.  In  this  section,  we  assume  that  (  < 
the  bound  on  (  needed  to  prove  Theorem  2.3.5. 

In  the  remainder  of  this  section,  we  analyze  the  procedure  Reduce  shown  in  Figure  2.12. 
First,  we  show  that  if  the  granularity  condition  is  satisfied,  the  number  of  iterations  in  Reduce 
is  small.  Second,  we  give  an  even  smaller  bound  on  the  number  of  iterations  for  the  case  in 
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Reduce(/,  r,  f ,  O’,  for  I  =  1, ,  k) 
o  «—  (1  +  log(me~^). 

while  A  >  r  and  /  and  t  are  not  (-optimal 
For  each  edge  vw,  t{vw)  «— 

Call  FindPath(/,/,  c)  to  And  an  (-long  flow  path  P  and  a  short  path  Q  with  the  same  endpoints  as 
Reroute  (t,-  units  of  flow  from  P  to  Q. 
return  /. 


Figure  2.12:  Procedure  Reduce. 

which  the  flow  /  is  0(()-optimal  when  Reduce  is  called.  Finally,  we  will  give  two  algorithms, 
Unit  and  ScalingUnit,  that  bound  the  number  of  iterations  needed  to  solve  unit  capacity 
concurrent  flow  problems.  The  former  solves  the  case  when  e  =  0(1),  while  the  latter  solves 
the  case  when  (  =  o(l).  As  in  the  general  case,  for  ease  of  presentation,  we  assume  a  model  of 
computation  in  which  exponentiation  takes  0(1)  time  and  the  word  size  is  unbounded.  In  later 
sections  we  will  remove  this  assumption,  and  discuss  the  implementation  of  an  iteration  of  the 
algorithm. 

Bounding  the  number  of  iterations  of  Reduce 

At  the  beginning  of  Reduce,  q  is  set  equal  to  (1  -|-c)r”*c“*  log(mc“*),  which  is  essentially  the 
same  as  in  the  general  case.  While  A  >  r,  the  value  of  q  is  sufficiently  large,  so  by  Lemma  2.4.3 
relaxed  optimality  condition  Rl  is  satisfied.  If  we  are  lucky  and  relaxed  optimality  condition 
R2'  is  also  satisfied,  then  it  follows  that  /  and  £  are  (-optimal.  Now,  we  show  that  if  R2'  is  not 
satisfied,  then  we  can  make  significant  progress.  As  before,  we  use  ♦  =  Y!,vw^e  £{vw)u{vw)  as 
a  measure  of  progress.  In  the  unit-capacity  case,  ^  =  53vu.€E 

Lemma  2.5.4  Suppose  <Tj  and  r  satisfy  the  granularity  condition.  Then  rerouting  o,  units  of 
flow  from  an  (-long  path  of  commodity  t  to  the  shortest  path  with  the  same  endpoints  decreases  $ 

by 

Proof:  Let  P  be  an  f-long  path  from  Sj  to  t,,  and  let  Q  be  a  shortest  (s,,  /,)-path.  Let  A  =  P—Q, 
and  B  =  Q  -  P.  The  only  edges  whose  length  changes  due  to  the  rerouting  are  those  in  Au  B. 
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The  decrease  in  ♦  is  /(A)  +  t{B)  —  c“""‘/(A)  —  which  can  also  be  written  as 

(1  -  €-°’')ie{A)  -  /(B))  -  (1  -  e—'OCe""’  - 

The  granularity  condition,  the  dehnition  of  a,  and  the  assumption  that  €  <  1/12,  imply  that 
0(^1  <  <  f/2  <  1/2.  For  0  <  I  <  |,  we  have  c*  >  1  +  *,  c*  <  1  +  |a;,  and  e~*  <  1  -  |i. 

Thus  the  decrease  is  at  least 

(/(A)  -  /(B))  -  (aa,)  /(B).  (2.24) 

Now,  observe  that  /(A)  -  /(B)  is  the  same  as  i(P)  -  £(Q),  and  /(Q)  =  dist/(sj,ti).  Also, 
/(B)  <  /(P).  Plugging  these  bounds  into  (2.24)  yields  a  lower  bound  of 

|a<T,  (/(P)  -  dist,(s„t.))  -  /(P).  (2.25) 

But  P  is  oloag,  so  the  quantity  in  (2.25)  must  be  at  least 


iE 


W  have  seen  that  |  >  ao,,  which  implies  that  jt  >  fcKTi,  and  therefore  the  first  term 
dominates  the  second  term.  Thus,  the  third  term  gives  a  lower  bound  on  the  decrease  in 
Substituting  the  value  of  a  and  using  the  fact  that  during  the  execution  of  Reduce  we  have 
T  <  A.  we  obtun  the  claim  of  the  lemma.  ■ 

The  following  theorem  bounds  the  number  of  iterations  in  Reduce. 


Theorem  2.5.5  If,  for  each  commodity  t,  the  values  r  and  Ci  satisfy  the  granularity  condition 
and  we  have  A  =  0(r)  initially  then  the  procedure  Reduce  terminates  after  0(c"*  max^ 
iterations.  If  in  addition,  the  input  flow  /  is  O(c)'optimal,  then  the  procedure  Reduce  terminates 
after  0(max^  mtn{D.k<i,)  ^  jtgfjtjQns. 
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Unit(J,<) 

For  each  commodity  t:  <Ti  *—  di,  create  a  simple  path  from  s,  to  U  aod  route  d,  flow  on  it. 
T «—  A/2. 

while  /  is  not  12<-optimaI 
for  every  i 

until  ffi  and  r  satisfy  the  granularity  condition 

<Ti  <Ti/2. 

f  Reduce(/,  t,  e,  d). 

T  ^  r/2. 
return  /. 


Figure  2.13:  Procedure  Concurrent. 

Proof :  The  same  as  the  proof  of  Lemma  2.4.6  with  max^  log  m  substituted  for 

■ 

In  most  cases,  one  iteration  of  the  loop  in  Reduce  is  dominated  by  the  time  spent  in 
Fin  DEATH,  so  we  concentrate  on  bounding  the  number  of  calls  to  Fin  DEATH.  Although  there 
are  some  cases  in  which  the  call  to  Fin  death  is  not  the  dominant  part  of  an  iteration  of 
Reduce,  we  assume  for  now  that  it  is.  In  Section  2.5.5,  we  will  see  a  case  when  the  time  spent 
on  calls  to  Findeath  is  not  the  dominant  step,  and  wiU  deal  with  it  separately  there. 

Solving  Unit  Capacity  Concurrent  Flow 

In  this  section,  we  give  approximation  algorithms  for  the  concurrent  flow  problem  with  uniform 
capacities.  We  describe  two  algorithms:  Unit  and  ScalingUnit.  Algorithm  Unit  is  simpler 
and  is  best  if  c  is  constant.  ScalingUnit  gradually  scales  c  to  the  right  value  and  is  faster  for 
€  =  o(l). 

The  presentation  of  this  section  is  slightly  different  than  in  the  general  case.  Instead  of 
expressing  the  running  times  in  terms  of  the  number  of  minimum-cost  flow  computations,  we 
express  it  in  terms  of  the  number  of  calls  to  the  procedure  Findeath.  Findeath  has  both  a 
deterministic  and  a  randomized  implementation,  so  we  will  put  off  the  difference  between  the 
two  until  we  discuss  Findeath  in  more  detail. 

Algorithm  Unit  (see  Figure  2.13)  consists  of  a  sequence  of  calls  to  procedure  Reduce 
described  in  the  previous  section.  The  initial  flow  is  constructed  by  routing  each  commodity 
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t  on  a  single  flow  path  from  to  ti.  Iiutially,  we  set  Oi  =  dt.  Before  each  call  to  Reduce 
we  divide  the  flow  quantum  Oj  by  2  for  each  commodity  where  this  is  needed  to  satisfy  the 
granularity  condition  (2.23).  Each  call  to  Reduce  modifies  the  multicommodity  flow  /  so 
that  either  A  decreases  by  a  factor  of  2  or  /  becomes  c-optimal.  In  the  latter  case  algorithm 
Concurrent  terminates  and  returns  the  flow.  As  we  will  see,  O(logfn)  calls  to  Reduce  will 
suffice  to  achieve  e-optimality. 

Theorem  2.5.6  Let  Tp  be  the  time  used  by  by  procedure  Findpath.  The  algorithm  Unit  finds 
an  e-optimal  multicommodity  flow  in  O((e~’ifclogn  -}-  e“®mlogT»)r/')  time. 

Proof:  Immediately  after  the  initialization,  we  have  X  <  D.  To  bound  the  number  of  phases 
we  need  a  lower  bound  on  the  minimum  value  of  A.  Observe  that  for  every  multicommodity 
flow  /,  the  total  amount  of  flow  in  the  network  is  D.  Every  unit  of  flow  contributes  to  the  total 
flow  on  at  least  one  of  the  edges,  and  hence  f{vw)  >  D.  Therefore, 

A  >  D/m,  (2.26) 

which  implies  that  the  number  of  iterations  of  the  mmn  loop  of  Un  it  is 

O  (log  =  C>(log"i)- 

By  Theorem  2.5.5  procedure  Reduce  executes  0(t-^"”"ij  )  iterations  during  a  single  call  to 

Reduce.  Throughout  the  algorithm,  for  each  t,  a,  is  either  equal  to  di,  or  is  0(e*r/log(me“*)). 
In  the  first  case, 

min{Z?,fcdj}  _  min  {D,kdi} 

(Ti  ~  di 

D 
di' 

<  k. 


=  min  I 


In  the  second  case 
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Scaling  UNiT(I,e) 

■f '  J- - 

e  <—  jj. 

Call  Unit(J,c'),  and  let  /  be  the  resulting  flow, 
r  ^  r/2. 

while  e'  >  c, 

for  every  i, 

until  O',  and  r  satisfy  the  granularity  condition, 

<T,  ^  0'</2. 

/  ♦-  Reduce(/,  r,  f',  o). 
return  /. 

Figure  2.14:  Procedure  ScalingConcurrent. 

<  log(me“*). 

Thus,  the  total  number  of  iterations  of  the  loop  of  Reduce  is  +  The 

value  r  is  halved  at  every  iteration,  and  therefore  the  total  number  of  calls  required  for  all 
iterations  is  0(f"‘l:logn)  plus  twice  the  number  required  for  the  last  iteration  of  UNIT.  It 
follows  from  (2.26)  that  r  =  n(^),  and  the  total  number  of  iterations  of  the  loop  of  REDUCE 
is  at  most  O(f“^fclogn  +  e~^mlogn).  ■ 

If  c  =  o(l),  we  use  the  algorithm  ScalingConcurrent,  shown  in  Figure  2.14.  The  al¬ 
gorithm  starts  with  a  large  (  and  then  gradually  scales  e  down  to  the  required  value.  More 
precisely,  algorithm  ScalingUnit  starts  by  applying  algorithm  Unit  with  f  =  ^.  Scalin- 
gUnit  then  repeatedly  divides  e  by  a  factor  of  2  and  calls  Reduce.  After  the  initial  call  to 
Unit,  /  is  1-optimal,  and  A  is  no  more  than  twice  the  minimum  possible  value.  Therefore, 
A  cannot  be  decreased  below  t/2,  and  each  subsequent  call  to  Reduce  returns  an  e-optimal 
multicommodity  flow  (with  the  current  value  of  e).  As  in  Unit  each  call  to  Reduce  uses  the 
largest  flow  quantum  a  permitted  by  the  granularity  condition  (2.23). 

Theorem  2.5.7  Let  Tf  be  the  time  taken  by  procedure  Findpath,  then  algorithm  ScalingU¬ 
nit  finds  an  c-optimal  multicommodity  flow  in  0{{k  -I-  mf“®)  log  nT/-)  time. 

Proof:  As  is  stated  in  Theorem  2.5.6,  the  call  to  procedure  Unit  uses  0{{k  -f  m)logn)  calls 
to  Findpath  and  returns  a  1-optimal  multicommodity  flow  /.  Hence,  A  is  no  more  than  twice 
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the  minimum.  Therefore  all  subsequent  calls  to  Reduce  returns  an  e-optimal  multicommodity 
flow  /. 

The  time  required  by  one  iteration  is  dominated  by  the  call  to  Reduce.  The  input  flow  /  of 
Reduce  is  24€'-optimal,  so,  by  Theorem  2.5.5,  Reduce  executes  0(max,  iterations 

of  Findpath.  We  have  seen  in  the  proof  of  Theorem  2.5.6  that  0(maXj  ^  most 

0(Jt  ^  f^’^mlogn).  The  value  of  c'  is  reduced  by  a  factor  of  2  in  every  iteration.  So  the  total 
number  of  calls  to  Findpath  is 

O(ik  +  e'"^mlogn)  =  ^  0(k)  +  ^  0(€’^m  log  n). 

‘'=A.* . ‘  . ' 

There  are  at  most  log(f~*)  =  O(logn)  iterations,  so  the  first  term  sums  to  0(1: log n).  The 
second  sum  is  a  geometric  series  and  is  no  more  than  twice  the  last  term,  so  it  is  bounded  by 
O(f~^mlogn).  Combining  this  bound  with  the  bound  on  procedure  Unit  from  Theorem  2.5.6 
yields  the  claim.  ■ 

2.5.3  Implementing  One  Iteration 

We  have  shown  that  Reduce  terminates  after  a  small  number  of  iterations.  It  remains  to  show 
that  each  iteration  can  be  carried  out  quickly.  Reduce  consists  of  three  steps:  computing 
lengths,  executing  Findpath  and  rerouting  flow.  Assuming  that  exponentiation  can  be  per¬ 
formed  in  0(1)  time,  computing  lengths  takes  0(m)  time.  Thus,  we  focus  our  attention  on  the 
other  two  steps. 

We  first  consider  the  time  taken  by  procedure  FindPath.  We  shall  give  three  implementa¬ 
tions  of  this  procedure.  First,  we  will  give  a  simple  deterministic  implementation  that  runs  in 
0(k*(m  +  n log  n)  -f  n  time,  then  a  more  sophisticated  implementation  that  runs  in  time 

0(k‘nlogn  -f-  m(logn  +  min  {l:,l:*(logdn,»x  +  1)})),  and  finally  a  randomized  implementation 
that  runs  in  expected  0(€~^(m  -|-  nlogn))  time.  All  of  these  algorithms  use  the  shortest-paths 
algorithm  of  Fredman  and  Tarjan  [17]  that  runs  in  0(m  -I-  nlogn)  time. 

To  deterministically  find  a  bad  flow  path,  we  first  compute,  for  each  source  node  the 
length  of  the  shortest  path  from  to  every  other  node  u,  which  takes  0(k*(m  -I-  nlogn)) 
time.  In  the  simplest  implementation,  we  then  compute  the  length  of  every  flow  path  in  V  and 
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compare  its  length  to  the  length  of  the  shortest  path  to  decide  if  the  path  is  c-long.  There 
can  be  at  most  ^  flow  paths,  each  consisting  of  up  to  n  edges,  and  hence  computing  these 
lengths  takes  O  (n  time. 

To  decrease  the  time  required  for  FindPatb,  we  must  find  an  e-long  path,  if  one  exists, 
without  computing  the  length  of  every  path.  The  following  lemma  explains  how  to  achieve  this: 

Lemma  2.5.8  The  total  time  required  for  deterministically  implementing  an  iteration  of  Reduce 
(assuming  that  exponentiation  is  a  single  step)  is  0(il:*n  log  n-(-m(log  n-|-min  {k,  k*{\og  iL,.,  +  1)}))- 

If  there  is  an  c-long  flow  path  for  commodity  i  then  the  longest  flow  path  for  commodity 
t  must  be  e-long.  Thus,  instead  of  looking  for  an  e-long  path  in  Vi  for  some  commodity  t,  it 
suffices  to  find  an  e-long  path  in  the  directed  graph  obtained  by  taking  all  flow  paths  in  Vi,  and 
treating  the  paths  as  directed  away  from  Si.  In  order  to  see  if  there  is  an  e-long  path,  we  need 
to  compute  the  length  c  ‘  the  longest  path  from  s,  to  ti  in  this  directed  graph.  To  facilitate  this 
computation,  we  shall  maintain  that  the  directed  flow  graph  is  acyclic. 

Let  G,  denote  the  flow  graph  of  commodity  t.  If  G,  is  acyclic,  an  0(m)  time  dynamic 
programming  computation  suffices  to  compute  the  longest  paths  from  Si  to  every  other  node. 
Suppose  that  during  an  iteration  we  reroute  flow  from  an  e-long  path  from  s,  to  ti,  in  the  flow 
graph  G,.  We  must  first  update  the  flow  graph  Gi  to  reflect  this  change.  Second,  the  update 
might  introduce  directed  cycles  in  G,,  so  we  must  eliminate  such  cycles  of  flow.  We  use  an 
algorithm  due  to  Sleator  and  Tarjan  [63]  to  implement  this  process.  Sleator  and  Tarjan  gave 
a  simple  0(nm)  cilgorithm  and  a  more  sophisticated  G(m  log  n)  algorithm  for  the  problem  of 
converting  an  arbitrary  flow  into  an  acyclic  flow. 

Eliminating  cycles  only  decreases  the  flows  on  edges,  so  it  cannot  increase  Thus  the 
bound  from  Theorem  2.5.5  on  the  number  of  iterations  in  Reduce  still  holds. 

We  compute  the  total  time  required  for  each  iteration  of  Reduce  as  follows.  In  order  to 
implement  FindPath,  we  must  compute  a  shortest  path  from  s,  to  t,  in  G  and  the  longest 
path  from  Si  to  U  in  G,  for  every  commodity  i,  so  the  time  required  is  0(fc*(m  n  log  n)  +  km). 
Furthermore,  after  each  rerouting,  we  must  update  the  appropriate  flow  graph  and  elimi¬ 
nate  cycles.  Elimination  of  cycles  takes  0(m  log  n)  time.  Combining  these  bounds  gives  an 
0{k’n]og n  -f  m{k  -|-  log  n))  bound  on  the  running  time  of  FindPath. 
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In  fact,  further  improvement  is  possible  if  we  consider  the  flow  graphs  of  all  commodities 
with  the  same  source  and  same  flow  quantum  Oi  together.  Let  be  the  directed  graph 
obtained  by  taking  the  union  of  all  flow  paths  P  £  Vi  for  a  commodity  t  with  s,-  =  v  and 
(7,  =  a,  treating  each  path  as  directed  away  from  v.  If  is  acyclic,  an  0(m)  time  dynamic 
programming  computation  sufiices  to  compute  the  longest  paths  from  v  to  every  other  node  in 

During  our  concurrent  flow  algorithm  all  commodities  with  the  same  demand  have  same 
flow  quantum.  To  limit  the  diiferent  flow  graphs  that  we  have  to  consider  we  want  to  limit  the 
number  of  different  demands.  By  decomposing  demand  d,  into  at  most  [logd,J  +  1  demands 
with  source  s,  and  sink  t,  we  can  assume  that  each  demand  is  a  power  of  2.  This  way  the 
number  of  different  flow  graphs  that  we  have  to  maintain  is  at  most  k*(logdmu,  +  1).  ■ 

Next,  we  give  a  randomized  implementation  of  FindPath  that  is  much  faster  when  e  is  not 
too  small;  this  implementation  is  similar  to  the  randomized  implementation  of  the  general  case. 
If  /  and  I  are  not  c-optimal,  then  relaxed  optimality  condition  R2'  is  not  satisfied,  and  thus 
c-Iong  paths  contribute  at  least  an  f- fraction  of  the  total  sum  Therefore, 

by  randomly  choosing  a  flow  path  P  with  probability  proportional  to  its  contribution  to  the 
above  sum,  we  have  at  least  an  (  chance  of  selecting  an  e-long  path.  Furthermore,  we  will 
show  that  we  can  select  a  candidate  e-long  path  according  to  the  right  probability  in  0(m) 
time.  Then  we  can  compute  a  shortest  path  with  the  same  endpoints  in  0(m  -f  nlogn)  time, 
which  enables  us  to  determine  whether  or  not  P  was  an  e-long  path.  Thus  we  can  implement 
FindPath  in  C?(e"*(m-|-  nlogn))  expected  time. 

The  contribution  of  a  flow  path  P  to  the  above  sum  is  just  the  length  of  P  times  the  flow  on 
P,  so  we  must  choose  P  with  probability  proportional  to  this  value.  In  order  to  avoid  examining 
all  such  flow  paths  explicitly,  we  use  a  two-step  procedure,  as  described  in  the  following  lemma. 

Lemma  2.5.9  If  we  choose  an  edge  vw  with  probability  proportional  to  t{vw)f{vw),  and  then 
we  select  a  flow  path  among  paths  through  this  edge  vw  with  probability  proportional  to  the  value 
of  the  flow  carried  on  the  path,  then  the  probability  that  we  have  selected  a  given  flow  path  P  is 
proportional  to  its  contribution  to  the  sum  I3i=l  Hp^V, 


Proof:  Let  B  =  5Z,*=i  Select  an  edge  vw  with  probability  f{vw)£{vw)/B. 
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Once  an  edge  vw  is  selected,  choose  a  path  P  £  Vi  through  edge  yw  with  probability 
Consider  a  commodity  t  and  a  path  P  £  Vi. 


Pr{P  chosen) 


Pr{vw  chosen)  • 

^  f(vw)t(vw)  fi{P) 

^  <(tnp)/.(P) 

vitp  ^ 

MPyjP) 

B 


Choosing  an  edge  with  probability  proportional  to  i{vw)f{vw)  can  easily  be  done  in  0{m) 
time.  In  order  to  then  choose  with  the  right  probability  a  flow  path  going  through  that  edge, 
we  need  a  data  structure  to  organize  these  flow  paths.  For  each  edge  we  maintain  a  balanced 
binary  tree  with  one  leaf  for  each  flow  path  through  the  edge,  labeled  with  the  flow  value  of 
that  flow  path.  Each  internal  node  of  the  binary  tree  is  labeled  with  the  total  flow  value  of  its 
descendent  leaves.  The  number  of  paths  is  polynomial  in  n  and  e~*,  and  therefore  using  this 
data  structure,  we  can  randomly  choose  a  flow  path  through  a  given  edge  in  0(log  n)  time. 

In  order  to  maintain  this  data  structure,  each  time  we  change  the  flow  on  an  edge,  we  must 
update  the  binary  tree  for  that  edge,  at  a  cost  of  O(logn)  time.  In  one  iteration  of  Reduce 
the  flow  only  changes  on  0{n)  edges,  and  therefore  the  time  to  do  these  updates  is  0(n log  n) 
per  call  to  FindPath,  which  is  dominated  by  the  time  to  compute  single-source  shortest  paths. 

We  have  shown  that  if  relaxed  optimality  condition  R2'  is  not  satisfied,  then,  with  proba¬ 
bility  at  least  t  we  can  find  an  c-long  path  in  0(m-l-  n  log  n)  time.  FindPath  continues  to  pick 
paths  until  either  an  e-long  path  is  found  or  12e  trials  are  made.  Observe  that  given  that  /  and 
i  are  not  yet  e-optimal  (which  implies  that  condition  R2'  is  not  yet  satisfied),  the  probability 
of  failing  to  find  an  e-long  path  in  le  trials  is  bounded  by  1/e.  Thus,  in  this  case.  Reduce  can 
terminate,  claiming  that  /  and  t  are  e-optin.’  with  probability  at  least  1  -  1/e.  Computing 
lengths  and  updating  flows  can  each  be  done  in  O(nlogn)  time,  and  thus  we  get  the  following 
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bound: 

Lemma  2.5.10  One  iteration  of  Reduce  can  be  implemented  to  run  in  expected  time  (e~^(m  + 
nlogn))  time  (assuming  that  exponentiation  is  a  single  step).  ■ 

The  randomized  algorithm,  as  it  stands,  is  Monte  Carlo’,  there  is  a  non-zero  probability 
that  Reduce  erroneously  claims  to  terminate  with  an  f-optimal  /.  To  make  the  algorithm  Las 
Vegas  (never  wrong,  sometimes  slow),  we  introduce  a  deterministic  check.  If  FindPatH  fails  to 
find  an  e-long  path.  Reduce  computes  the  sum  (2tst/(s,,  t,)d,  to  the  required  precision  and 
compares  it  with  A53^^gg^(uuj)u(vujj  to  determine  whether  /  and  t  are  really  e-optimal.  If 
not,  the  loop  resumes.  The  time  required  to  compute  the  sum  is  O(fc*(m-|-nlogn)),  because  at 
most  fc*  single-source  shortest  path  computations  are  required.  The  probability  that  the  check 
must  be  done  t  times  in  a  single  call  to  Reduce  is  at  most  (e“*)‘“*,  so  the  total  expected 
contribution  to  the  running  time  of  Reduce  is  at  most  0{k*{Tn  -1-  nlogn)). 

Recall  that  the  number  of  iterations  of  Reduce  is  greater  than  max^  which  in  turn 

is  at  least  k.  Since  in  each  iteration  we  carry  out  at  least  one  shortest  path  computation,  the 
additional  time  spent  on  checking  does  not  asymptotically  increase  our  bound  on  the  running 
time  for  Reduce. 

2.5.4  Dealing  with  Exponentiation 

To  remove  the  assumption  that  exponentiation  can  be  performed  in  0(1)  time,  we  shall  do 
two  things.  First  we  shall  show  that  it  is  sufficient  to  work  with  edge  lengths  ({vw)  that  are 
approximations  to  the  actud  ’engths  t{vw)  =  We  then  show  that  computing  these 

approximate  edge  lengths  does  not  change  the  asymptotic  running  times  of  our  algorithms. 

In  fact,  we  show  that  for  large  values  of  e  (e.g.,  when  c  is  a  constant),  the  time  required 
for  FindPath  can  be  reduced  by  using  approximate  lengths.  To  do  so  requires  two  changes: 
using  a  different  implementation  of  Dijkstra’s  algorithm  and  using  a  more  sophisticated  data 
structure  for  storing  the  flow  paths  going  through  an  edge. 

The  first  step  is  to  note  that  in  the  proof  of  Lemma  2.5.4,  we  never  used  the  fact  that  we 
reroute  flow  onto  a  shortest  path.  We  only  need  that  we  reroute  flow  onto  a  sufficiently  short 
path.  More  precisely,  it  is  easy  to  convert  the  proof  of  Lemma  2.5.4  into  a  proof  for  the  following 
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claim.  The  conversion  is  similar  to  that  used  to  prove  Lemma  2.4.24  via  Lemma  2.4.8. 

Lemma  2.5.11  Suppose  that  Oi  and  r  satisfy  the  granularity  condition  and  let  P  be  a  flow  path 
of  commodity  t.  Let  Q  be  a  path  connecting  the  endpoints  of  P  such  that  the  length  of  Q  is 
no  more  than  €t{P)f2  +  greater  than  the  length  of  the  shortest  path  connecting  the  same 

endpoints.  Then  rerouting  <7,  units  of  flow  from  path  PtoQ  decreases  ♦  by  fl(  tj^^logm)). 


We  now  show  that  in  order  to  compute  the  lengths  of  paths  up  to  the  precision  given  in 
this  lemma,  we  only  need  to  compute  the  lengths  of  edges  up  to  a  reasonably  small  amount  of 
precision. 


By  Lemma  2.5.11,  the  length  of  a  path  can  have  a  rounding  error  of  c 


r(t;iv)u(vui) 


2D 


Each  path  has  at  most  n  edges,  so  it  suffices  to  ensure  that  each  edge  has  a  rounding  error  of 


“(^4  JL  i{vw)uivw)/2).  (2.27) 

”  ^  vwiB 

We  shall  now  bound  this  quantity.  The  value  A  is  the  maximum  flow  on  an  edge  and  hence 
must  be  ?♦  ’'’**•  ’'js  large  as  the  average  flow  on  an  edge,  i.e.,  A  >  fivw)/m.  Every  unit  of 
flow  contributes  to  the  total  flow  on  at  least  one  edge,  and  hence  Combining 

with  the  previous  inequality,  we  get  that 


X/D  >  1/m.  (2.28) 

The  potential  function  ^  ^  length  of  the  longest  edge, 

i.e., 

J]]  f(r«;)u(vu;)  >  e“\  (2.29) 

vwiE 

Plugging  (2.28)  and  (2.29)  into  (2.27),  we  see  that  it  suffices  to  compute  the  length  of  an  edge 
with  an  error  of  at  most  Each  edge  has  a  positive  length  of  at  most  and  can  be 

expressed  as  e°^p,  where  0  <  p  <  1.  Thus  we  need  to  compute  p  up  to  an  error  of  To  do 
so,  we  need  to  compute  C>(log(t~*nm))  bits,  which  by  the  assumption  that  f“'  is  polynomial 
in  n,  is  just  O(logn)  bits. 
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By  using  the  Taylor  series  expansion  of  e*,  we  can  compute  one  bit  of  the  length  function 
in  0(1)  time.  Therefore,  to  compute  the  lengths  of  all  edges  at  each  iteration  of  Reduce,  we 
need  0(m  log  n)  time.  We  shad  see  that  in  the  deterministic  implementation  of  Reduce  each 
iteration  takes  fl(m  log  n)  time  (the  time  required  for  cycle  canceling).  Therefore  the  time  spent 
on  computing  the  lengths  is  dominated  by  the  running  time  of  an  iteration. 

The  approximation  above  depends  on  the  current  value  of  A,  which  may  change  after  each 
iteration.  It  was  crucial  that  we  recomputed  the  lengths  of  every  edge  in  every  iteration.  The 
time  to  do  so,  0(m  log  n),  would  dominate  the  running  time  of  the  randonuzed  implementation 
of  Reduce.  (Recall  that  the  randomized  implementation  does  not  do  cycle  canceling.)  Thus, 
we  need  to  find  an  approximation  that  does  not  need  to  be  recomputed  at  every  iteration.  We 
choose  one  that  does  not  depend  on  the  current  A  and  hence  only  needs  to  be  updated  on  the 
0(n)  edges  on  which  the  flow  actually  changes.  We  proceed  to  describe  such  an  approximation 
that  depends  on  t  rather  than  A. 

Throughout  Reduce  all  edge  length  are  at  most  and  at  least  one  edge  has  length 

more  than  c®’’.  Therefore,  Dvuiec  same  argument  as  for 

the  deterministir  case  0(log  n)  bits  of  precision  suffice  throughout  Reduce.  When  we  first  call 
Reduce,  we  must  spend  0(m log n)  time  to  compute  the  edge  lengths.  For  each  subsequent 
iteration,  we  only  need  to  spend  O(nlogTi)  time  updating  the  0(n)  edges  whose  length  has 
changed.  Since  each  iteration  of  Reduce  is  expected  to  take  (?(c“*(m  +  nlogn))  time  to 
compute  shortest  paths  in  FindPath,  the  time  for  updating  edges  is  dominated  by  the  time 
required  by  FindPath.  While  it  appears  that  the  time  to  initially  compute  all  the  edge  lengths 
may  dominate  the  time  spent  in  one  call  to  Reduce,  as  we  have  seen,  whenever  any  of  our 
algorithms  calls  Reduce,  it  performs  n(logn)  iterations.  Each  iteration  is  expected  to  take  at 
most  n(f“^/n)  time  to  compute  the  shortest  paths  in  FindPath.  Therefore,  the  time  spent  on 
initializing  lengths  is  dominated  by  the  running  time  of  Reduce. 

In  describing  the  randomized  version  of  FindPath  in  Lemma  2.5.9,  we  assumed  we  knew 
the  exact  lengths.  By  using  the  approximate  lengths,  however,  we  do  not  significantly  change  a 
path’s  apparent  contribution  to  the  sum  i(P)fi(P).  Hence,  we  do  not  significantly 

reduce  the  probability  of  selecting  a  bad  path. 

Thus  we  have  shown  that  without  any  assumptions,  Reduce  can  be  implemented  determin- 
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istically  in  the  same  time  as  is  stated  in  Lemma  2.5.8.  Although  for  the  randomized  version, 
there  is  additional  initialization  time,  for  all  the  algorithms  in  this  chapter,  the  initialization 
time  is  dominated  by  the  time  spent  in  the  iterations  of  Reduce. 

Theorem  2.6.12  The  running  times  required  for  the  deterministic  implementations  of  proce¬ 
dure  Reduce  stated  in  Lemma  2.5.8  hold  without  the  assumption  that  expor>entiation  takes  0(1) 
time.  The  times  required  by  the  randomized  implementations  increase  by  an  additive  term  of 
0(c~'m log n)  without  this  assumption. 

2.5.5  Further  implementation  details 

In  this  section  we  show  how  one  can  reduce  the  time  per  iteration  of  Reduce  for  the  case 
in  which  c  is  a  constant.  First,  we  show  how  using  approximate  lengths  can  reduce  the  time 
required  by  FindPath;  we  use  an  approximate  shortest-paths  algorithm  that  runs  in  0(m  -I- 
n("^)  time.  Then,  we  give  improved  implementation  details  for  an  iteration  of  Reduce  to 
decrease  the  time  required  by  other  parts  of  Reduce. 

We  now  describe  how,  given  the  lengths  and  an  c-long  path  P  from  s  to  t,  we  can  find,  in 
0(m  -I-  nf“*)  time,  a  path  Q  with  the  same  endpoints  such  that  1{Q)  <  d»s//(s,t)  -I-  ef(P)/2. 
First,  we  discard  all  edges  with  length  greater  than  1{P),  for  they  can  never  be  in  a  path 
that  is  shorter  than  P  (if  P  is  a  shortest  path  between  s  and  i  then  P  is  not  an  «-long  path). 
Next,  on  the  remaining  graph,  we  compute  shortest  paths  from  s  using  approximate  edge- 
lengths  i(v,w)  =  [^(vuj);^],  thus  giving  us  </ist/(s,  t),  an  approximation  of  disti{s,t), 

the  length  of  the  actual  shortest  (a,  t)-path.  There  are  at  most  n  - 1  edges  on  any  shortest  path, 
and  for  each  such  edge,  the  approximate  length  is  at  most  more  than  the  actual  length. 
Thus  we  know  that 


disti{s,t)  < 


distt{s,t)  +  n—^ 
dtstt{s,t)  + 


Further,  since  each  shortest  path  length  is  an  integer  multiple  of  and  no  more  than  ({P),  we 
can  use  Dial’s  implementation  of  Dijkstra’s  algorithm  [13]  to  compute  distils,  t)  in  0(m-|-ne~‘) 


time. 
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Implementing  FindPath  with  this  approximate  shortest  path  computation  directly  im¬ 
proves  the  time  required  by  a  deterministic  implementation  of  Reduce.  The  randomized  imple¬ 
mentation  of  FindPath  with  approximate  shortest  path  computation  requires  0(€"^(m-(-nf~‘)) 
expected  time.  In  order  to  claim  that  an  iteration  of  Reduce  can  be  implemented  in  the  same 
amount  of  time,  we  must  handle  two  difficulties;  updating  edge  lengths  and  updating  each 
edge’s  table  of  flow  paths  when  flow  is  rerouted.  Previously,  these  steps  took  O(nlogn)  time, 
which  was  dominated  by  the  time  for  FindPath.  We  have  reduced  the  time  for  FindPath,  so 
the  time  for  these  steps  now  dominates.  We  now  show  how  to  carry  out  these  steps  in  0(n) 
time.  For  the  first  step,  we  show  that  a  table  can  be  precomputed  so  that  each  edge  length  can 
be  updated  in  constant  time.  For  the  second  step,  we  sketch  a  three-level  data  structure  that 
allows  selection  of  a  random  flow  path  through  an  edge  in  0(n)  time,  and  allows  constant-time 
addition  and  deletion  of  flow  paths. 

Suppose  that  before  computing  the  length  we  were  to  round  af(vw)  to  the  nearest 

multiple  of  </c,  for  some  constant  c.  This  rounding  introduces  an  additional  multiplicative 
error  of  I  -f  0{(lc)  in  the  length  of  each  edge  and  hence  an  additional  multiplicative  error  of 
1  4-  0(c/c)  on  each  path.  By  arguments  similar  to  those  in  the  previous  subsection,  however, 
this  process  still  gives  us  a  sufficiently  precise  approximation. 

Now  we  show  that  by  rounding  in  this  way,  there  are  a  small  enough  number  of  possible 
values  for  l{vw)  that  we  can  just  compute  them  all  at  the  beginning  of  an  iteration  of  Reduce 
and  then  compute  the  length  of  an  edge  by  simply  looking  up  the  value  in  a  precomputed  table. 
The  largest  value  of  af(vw)  we  ever  encounter  is  C>(c“Mogn).  Since  we  are  only  concerned 
with  multiples  of  c/c,  there  are  a  total  of  only  0(<”*logn)  values,  we  can  ever  encounter.  At 
the  beginning  of  each  iteration,  we  compute  each  of  these  numbers  to  a  precision  of  O(logn) 
bits  in  0((~^log^  n)  time.  Once  we  have  computed  all  these  numbers,  we  compute  the  length 
of  an  edge  by  computing  af(vw),  truncating  to  a  multiple  of  f/c,  and  then  looking  up  the 
value  of  £{vw)  in  the  table.  This  process  takes  0(1)  time.  Thus  for  constant  c,  we  are  spending 
O(log^  n  -I-  m)  =  0(m)  time  per  iteration. 

Now,  we  address  the  problem  of  maintaining,  for  each  edge,  the  flow  paths  going  through 
that  edge.  Henceforth,  we  will  describe  the  data  structure  associated  with  a  single  edge.  First, 
suppose  that  all  flow  paths  carry  the  same  amount  of  flow,  i.e.,  (Ti  is  the  same  for  each.  In  this 
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case,  we  keep  pointers  to  the  flow  paths  in  an  array.  We  maintain  that  the  array  is  at  most  1  /4 
empty.  It  is  then  possible  to  randomly  select  a  flow  path  in  constant  expected  time  as  follows; 
one  randomly  chooses  an  index  and  checks  whether  the  corresponding  array  entry  has  a  pointer 
to  a  flow  path.  If  so,  select  that  flow  path.  If  not,  try  another  index. 

One  can  delete  flow  paths  from  the  array  in  constant  time.  If  one  maintains  a  list  of  empty 
entries,  one  can  also  insert  in  constant  time.  If  the  array  gets  too  full,  copy  its  contents  into 
a  new  array  of  twice  the  size.  The  time  required  for  copying  can  be  amortized  over  the  time 
required  for  the  insertions  that  filled  the  array.  If  the  array  gets  too  empty,  copy  its  contents 
into  a  new  array  of  half  the  size.  The  time  required  for  copying  can  be  amortized  over  the 
time  required  for  the  deletions  that  emptied  the  array.  (See,  for  example,  [12],  for  a  detailed 
description  of  this  data  structure.) 

Now,  we  consider  the  more  general  case  in  which  the  flow  values  of  flow  paths  may  vary. 
In  this  case,  we  use  a  three-level  data  structure.  In  the  top  level,  the  paths  are  organized 
according  to  their  starting  nodes.  In  the  second  level,  the  paths  with  a  common  starting  node 
are  organized  according  to  their  ending  nodes.  The  paths  with  the  same  starting  and  ending 
nodes  may  be  assumed  to  belong  to  the  same  commodity,  and  hence  all  carry  the  same  amount 
of  flow.  Thus,  these  paths  can  be  organized  using  the  array  as  described  above. 

The  first  level  consists  of  a  list.  Each  Ust  item  specifies  a  starting  node,  the  total  flow  of  all 
flow  paths  with  that  starting  node,  and  a  pointer  to  the  second-level  data  structure  organizing 
the  flow  paths  with  the  given  starting  node.  Each  second-level  data  structure  also  consists  of 
a  bst.  Each  item  in  the  second  level  bst  specifies  an  ending  node,  the  total  flow  of  all  flow- 
paths  with  that  ending  node  and  the  given  starting  node,  and  a  pointer  to  the  third-level  data 
structure,  the  array  containing  flow  paths  with  the  given  starting  and  ending  nodes. 

Now  we  analyze  the  time  required  to  maintain  this  data  structure.  Adding  and  deleting 
a  flow  path  takes  0(1)  time.  Choosing  a  random  flow  path  with  the  right  probabibty  can  be 
accomplished  in  0{n)  time.  First  w  randomly  choose  a  value  between  0  and  the  total  flow 
through  the  edge.  Then  we  scan  the  first-level  bst  to  select  an  appropriate  item  based  on  the 
value.  Next  we  scan  the  second-level  Ust  pointed  to  by  that  item,  and  select  an  item  in  the 
second-level  Ust.  Each  of  these  two  steps  takes  0(n)  time.  Finally,  we  select  an  entry  in  the 
third-level  array.  In  the  third  level  array,  ail  flows  have  the  same  a,,  thus  an  entry  can  be  chosen 
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0(1)  expected  time  by  the  scheme  described  for  the  special  case  when  all  flow  paths  had  the 
same  value. 

So  we  have  shown  that  for  constant  c,  each  of  the  three  steps  in  procedure  Reduce  can  be 
implemented  in  0(m)  expected  time,  thus  yielding  the  following  lemma. 

Lemma  2.5.13  If  c  =  0(1),  then  procedure  Reduce  can  be  implemented  in  expected  0(m) 
time. 

Combining  the  results  in  this  section,  we  get  theorems  that  summarize  our  results.  First,  we 
consider  the  case  for  a  constant  €.  In  this  case  we  use  algorithm  UNiTCombining  Theorem  2.5.6, 
Lemmas  2.5.8  and  2.5.10,  Theorem  2.5.12  and  Lemma  2.5.13  we  get  the  following  theorem: 

Theorem  2.5.14  For  any  constant  c  >  0,  algorithm  Unit  finds  an  c-optimal  solution  for  the  unit- 
capacity  concurrent  flow  problem  in  0(m(fc -F  m)  log  n)  expected  time  and  in  0(m(l:-F  m)(logn -F 
min{l:,l:'  logdmax})logn)  deterministically.  ■ 

Using  Theorem  2.5.7,  Lemmas  2.5.8  and  2.5.10,and  Theorem  2.5.12  we  obtain  the  following 
bounds  on  the  running  time  of  algorithm  ScalingUnit: 

Theorem  2.5.15  For  0  <  f  <  1/12,  algorithm  ScalingUnit  finds  an  (-optimal  solution  to  the 
unit-capacity  concurrent  flow  problem  in  expected  time  0((fcc”’  -F  mc~^ log  n)(m  -F  nlogn))  and 
determ  in  !-itically  in  time  0((k  -F  ^  m)  log  n)(k*n  log  n  -F  m(logn  -F  min  {A:./:*(log(L...  +  1)}))). 


2.6  Open  Problems 

The  big  open  questions  are  whether  the  dependence  on  (  can  be  reduced,  and  whether  an  algo¬ 
rithm  similar  to  Concurrent  can  be  used  to  get  an  exact  algorithm  for  the  multicommodity 
flow  problem.  The  dependence  on  k,  n  and  m  is  certainly  acceptable,  since  the  best  algorithms 
for  performing  k  maximum  flows  take,  up  to  logarithmic  factors,  O(knm)  time.  Yet,  the  de¬ 
pendence  on  (  is  not  as  satisfying.  If  we  want  to  get  solutions  that  have  accuracy  e  =  o{ti~  ^), 
adgorithm  Concurrent  takes  more  time  than  the  exact  linear  programming  algorithms. 
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Figure  2.15;  A  bad  example  for  an  of  our  algorithm 


The  other  problem  is  that  of  obtaining  an  exact  algorithm.  As  is  done  in  interior-point 
linear  programming  algorithms,  we  could  run  our  algorithm  until  it  is  possible  to  do  a  rounding 
step.  In  order  to  achieve  the  necessary  accuracy,  however,  c  must  be  much  too  small  to  be 
of  any  practical  interest.  Are  there  any  approaches  that  would  allow  us  to  round  earlier? 
One  drawback  of  our  algorithm  is  that,  given  an  optimal  solution,  it  is  unable  to  recognize  it. 
Consider  the  graph  in  Figure  2.15.  This  problem  has  two  commodities.  Commodity  1  wants  to 
send  2  units  of  flow  between  Vj  and  V2  and  commodity  2  wants  to  send  1  unit  of  flow  between  Vi 
and  V3.  Suppose  that  as  an  initial  solution  we  choose  the  routing  that  appears  in  the  graph  on 
the  right.  The  flow  of  commodity  1  is  represented  by  heavy  lines  and  the  flow  for  commodity 
2  is  represented  by  a  dashed  line.  This  solution  has  A  =  1,  which  is  optimal.  The  values  of 
the  edge  lengths  appear  along  the  edges.  Consider  the  two  paths  for  commodity  1.  The  top 
path  V1V2  has  cost  e®,  while  the  bottom  path  V1V3V2  has  cost  e“  -I-  Thus,  the  algorithm 

routes  flow  off  the  bottom  path  and  onto  the  top  path.  The  flow  of  the  top  path  is  now  more 
than  the  capacity,  which  causes  A  to  increase.  Thus  we  no  longer  have  an  optimal  solution. 

The  reason  that  this  unfortunate  phenomenon  occurs  is  that  we  require  that  the  dual  vari¬ 
ables  be  a  predetermined  function  of  the  primal  variables.  There  exist  problems  for  which  such 
a  solution  seems  hard  to  achieve.  An  algorithm  that  computes  an  exact  solution  would  probably 
need  to  relax  this  condition.  Perhaps  a  more  fruitful  approach  is  to  use  our  algorithm  to  get 
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close  to  the  optimal  solution  and  then  switch  to  some  other  algorithm. 


Chapter  3 


Applications  of  Multicommodity 
Flow' 


3.1  Introduction 

The  techniques  for  solving  multicommodity  flow  problems  have  a  host  of  applications  and 
extensions.  In  Section  3.2  we  shall  discuss  a  special  case  where  it  is  possible  to  find  integral 
flows,  and  which  has  applications  to  a  VLSI  routing  problem.  In  Section  3.3,  we  show  how  the 
solution  to  a  concurrent  flow  problem  can  be  used  to  help  find  sparse  cuts  in  graphs. 


3.2  An  Integer  Theorem  for  Multicommodity  Flows 

In  this  section  we  discuss  situations  in  which  the  techniques  presented  in  Chapter  2  can  be 
used  to  obtain  good  integral  solutions.  None  of  the  four  algorithms.  Concurrent,  Scaling- 
Concurrent,  Unit,  or  ScalingUnit  find  flows  that  are  integral.  For  mamy  applications, 
it  is  desirable  to  have  integral  solutions.  In  Section  3.2.1,  we  will  discuss  an  application  to  a 
VLSI  routing  problem,  in  which  the  flows  represent  numbers  of  wires,  and  thus  we  want  the 
flow  to  take  on  integral  values.  In  general,  we  can  not  obtain  results  about  integral  solutions 
that  are  as  strong  as  the  results  we  have  obtained  about  non-integral  solutions.  For  some  cases 

'Thu  chapter  contains  joint  work  with  Tom  Leighton,  Fillia  Makedon,  Serge  Plotkin,  £^va  Tardos  and  Spyros 
Tragoudas  [42]  and  joint  work  with  Philip  Klein,  Serge  Plotkin  and  £)va  Tardoe  [35]. 
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of  interest,  however,  we  can  obtain  rather  strong  results.  The  ability  to  do  so  is  interesting 
because  the  integer  multicommodity  flow  problem  seems  to  be  harder  than  the  non-integral 
one:  the  integer  problem  is  NP-hard,  while,  as  we  have  discussed,  the  non-integral  problem  can 
be  solved  exactly  in  polynomial  time,  using  linear  programming. 

For  the  remainder  of  this  section,  we  focus  on  the  case  when  we  have  a  unit-capacity  unit- 
demand  problem  in  which  A*  =  n(logn).  Using  similar  arguments  one  can  obtain  integer 
solutions  within  some  guaranteed  factor  of  optimal  for  some  problems  in  which  the  demands 
and  capacities  are  not  uniform,  but  we  will  not  pursue  that  direction  here.  In  Chapter  6  we 
will  show  how  to  find  integer  solutions  to  a  related  problem. 

The  unit-capacity  unit-demand  problem  in  which  A*  =  fl(logn)  is  one  of  those  studied 
by  Raghavan  [50]  and  Raghavan  and  Thompson  [51,  52].  The  problem  is  to  find  a  solution 
to  a  unit-capacity  unit-demand  concurrent  flow  problem  in  which  all  flows  must  be  integral. 
They  introduced  a  technique  for  solving  this  type  of  problem  known  as  randomized  rounding. 
We  describe  the  idea  as  it  relates  to  the  unit-capacity  unit-demand  concurrent  flow  problem. 
First,  ignore  the  integrality  constraints  and  solve  the  resulting  problem,  known  as  the  linear- 
programming  relaxation,  using  any  linear-programming  algorithm.  Since  this  problem  has  unit 
demands,  the  flow  for  each  commodity  is  a  collection  of  paths,  each  of  which  carries  some 
smiount  of  flow  between  0  and  1.  Then  interpret  the  flow  on  path  p  as  the  probability  that  all 
flow  for  that  commodity  is  on  path  p.  Using  Chernoff-type  bounds,  Raghavan  and  Thompson 
show  that,  if  A*  =  n(logn),  then  the  resulting  flow  is  such  that 

A<  A*-}-0(\/A*logn).  (3.1) 

Raghavan  later  showed  how  to  make  this  algorithm  deterministic.  The  deterministic  algorithm 
still  requires  the  solution  of  the  linear-programming  relaxation  and  a  derandomized  version  of 
the  randomized  rounding. 

Our  algorithms  can  be  used  to  replace  the  linear-programming  step,  thereby  giving  a  faster 
algorithm  for  the  problem.  However,  we  can  use  our  techniques  to  achieve  a  more  interest¬ 
ing  and  efficient  result.  By  slightly  modifying  algorithm  ScalingUnit,  we  obtain  an  algo¬ 
rithm  that  finds  an  integral  flow  satisfying  (3.1)  directly.  In  other  words,  we  do  not  need  to 
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solve  the  linear- program  relaxation  and  perform  the  rounding.  Our  modification  to  algorithm 
ScalingUnit  provides  an  alternative  proof  of  Raghavan  and  Thompson's  result[51].  It  also 
yields  a  significantly  faster  algorithm,  since  the  time-consuming  step  is  now  approximately 
solving  a  concurrent  flow  problem,  rather  than  exactly  solving  a  linear-program. 

To  find  such  an  integral  flow,  we  simply  run  algorithm  ScalingUnit  with  the  modification 
that  we  never  allow  any  of  the  a,  to  become  non-integral.  With  this  modification,  di  =  1,  t  = 
1, . . .  A:,  and  therefore  <7i  =  1,  *  =  1, . . ., fc.  Thus,  we  never  allow  the  step  at  <7i/2  in  Unit  or 
ScalingUnit  to  be  executed.  Consequently,  if  the  granularity  condition  (2.23)  becomes  false, 
we  terminate  the  algorithm.  We  now  show  that  this  algorithm  gives  an  integral  solution  that 
is  sufficiently  close  to  optimal. 

Theorem  3.2.1  Assume  we  run  algorithm  ScalingUnit  on  a  concurrent  flow  problem  in  which 
all  d,  =  1,  t  =  1, . . .,  A;,  but  maintain  that  the  flows  are  integral  by  terminating  whenever  we  would 
have  executed  the  step  <7j  <7,72.  If  A*  =  Q(log  n).  then  this  algorithm  yields  an  integral  solution 

with  A  =  A*  4-  0(7'^*logn). 

Proof:  ScalingUnit  begins  with  a  call  to  Unit.  The  call  to  Unit  can  terminate  in  two  ways, 
either  with  a  ^-optimal  flow,  or  because  granularity  condition  (2.23)  would  become  false  if  <7^ 
were  divided  by  2.  First  suppose  the  call  to  Unit  terminates  because  the  granularity  condition 
becomes  false.  At  this  point,  we  have 


1  2 

2  ~  ^  31og(mf~‘) 


<  1, 


(3.2) 


where  r  is  the  target  value  for  A.  In  particular,  we  have  t  >  A/2  and  f  and  therefore 

A  =  O(logn).  By  our  assumption  A*  =  fl(logn),  and  thus  A  <  A*  -|-  0{y/X’  logn). 

Now  assume  that  the  call  to  Unit  terminates  with  a  -^-optimal  flow.  We  proceed  with 
ScalingUnit.  It  terminates  when  the  granularity  condition  becomes  false,  at  which  point 
inequality  (3.2)  implies  that  =  0((log m)/T).  The  flow  /  is  f-optimal  and  integral.  So 
S  (1  +  <)A*  <  A*  4-  0(A*v/(logm)/r).  Since  r  =  A/2  >  A*/2,  this  bound  on  A  is  at  most 
A*  4-  0(v/A*  log  m),  as  required.  ■ 

Observe  that  Theorem  3.2.1  gives  a  direct  proof  of  the  theorem  of  Raghavan  and  Thompson, 
in  the  sense  that  in  order  to  find  an  integral  flow,  we  never  need  to  resort  to  a  fractional  flow  as 
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an  intermediate  step.  Not  only  does  it  give  a  direct  proof,  but  it  also  gives  a  faster  algorithm. 
On  any  input,  the  modified  integral  algorithm  does  no  more  work  than  the  original  version  of 
ScalingUnit.  Hence,  the  running  times  of  Theorem  2.5.15  apply.  We  can  derive  even  faster 
running  times,  however,  by  reanalyzing  the  algorithm  for  this  special  case. 

Theorem  3.2.2  If  =  f2(logm),  a  flow  such  that  A  <  A*  +  0(v/A*logn)  can  be  found 
by  a  randomized  algorithm  in  expected  time  O(kmlogk  +  +  nlogn)/v/logn),  and  by  a 

deterministic  algorithm  in  time  0(A:logfc(k*nlogn  +  mk*  +  mlogn)). 

Proof  :  We  have  shown  that  algorithm  ScalingUnit  finds  the  required  routing  if  it  is  termi¬ 
nated  as  soon  as  the  granularity  condition  becomes  false  with  0  =  1.  Now  we  analyze  the  time 
required. 

We  begin  with  a  call  to  Unit.  Recall  that  Unit  repeatedly  calls  Reduce,  and  each  call 
reduces  A  by  a  factor  of  2.  Since  each  di  =  I  and  there  are  k  commodities,  throughout  the  algo¬ 
rithm  1  <  A  <  ib.  Thus,  there  are  O(log  A;)  calls  to  Reduce.  By  Theorem  2.5.5,  each  call  to  Re¬ 
duce  consists  of  <9(c"^  max<  )  calls  to  findpath.  In  this  case,  naax,  = 

0(c~' max, min  {fc,  1:} /I)  =  0{€~^k).  Hence,  Unit  consists  of  0(A:logib)  calls  to  findpath, 
when  c  is  constant.  The  remainder  of  ScalingUnit  consists  of  a  series  of  calls  to  reduce, 
each  with  c  decreasing  by  a  factor  of  2.  Hence,  the  time  for  the  series  of  calls  in  the  randomized 
implementation  is  dominated  by  the  time  for  the  last  iteration.  The  last  iteration  consists  of 
0{k)  calls  to  FINDPATH  with  c  =  0(v/log  m/r).  Since  the  total  amount  of  flow  in  the  network 
is  k,  we  have  r  =  0(A)  =  0(fc),  and  thus  c”*  =  0{y/k/\ogn).  In  the  deterministic  implemen¬ 
tation,  we  use  the  fact  that  there  are  at  most  O(log  \/k/\ogn)  =  O(logfc)  iterations,  for  a  total 
bound  of  0(A:logA:)  calls  to  Findpath. 

To  derive  the  running  times,  we  just  need  to  plug  in  the  time  for  findpath.  Using  Lemma 
2.5.8,  Lemma  2.5.10,  Theorem  2.5.12  and  Lemma  2.5.13  for  the  times  for  findpath  yields  the 
theorem.  ■ 

3,2.1  Applications  to  VLSI  routing 

In  this  section,  we  discuss  the  problem  of  approximately  minimizing  channel  width  in  VLSI 
routing.  Often,  a  VLSI  design  consists  of  a  collection  of  modules  separated  by  channels.  The 
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modules  are  connected  up  by  wires  that  are  routed  through  the  channels.  For  purposes  of 
regularity  the  channels  have  uniform  width.  It  is  desirable  to  minimize  that  width  in  order  to 
minimize  the  total  area  of  the  VLSI  circuit.  Ri^havan  and  Thompson  [51]  give  an  approxima¬ 
tion  algorithm  for  minimizing  the  channel  width.  They  model  the  problem  as  a  graph  problem 
in  which  one  must  route  wires  between  pairs  of  nodes  in  a  graph  Cj  so  as  to  minimize  the 
maximum  number  of  wires  routed  through  an  edge.  To  approximately  solve  the  problem,  they 
first  solve  a  concurrent  flow  problem  where  there  is  a  commodity  with  demand  1  for  each  path 
that  needs  to  be  routed.  An  optimal  solution  /opt  fails  to  be  a  wire  routing  only  in  that  it 
may  consist  of  paths  of  fractional  flow.  The  value  of  |/opt|  is  certainly  a  lower  bound  on  the 
minimum  channel  width.  Raghavan  and  Thompson  give  a  randomized  method  for  converting 
the  fractional  flow  fopt  to  an  integral  flow,  increasing  the  channel  width  only  sbghtly.  The 
resulting  wire  routing  /  achieves  channel  width  at  most 

l/opt|  +  0(\/|/opt|logn)  (3.3) 

which  is  at  most  u^min  +  0 ( y/wmin  log  n ) ,  where  is  the  minimum  width.  In  fact,  the  constant 

implicit  in  this  bound  is  quite  small.  Later  Raghavan  [49]  showed  how  this  conversion  method 
can  be  made  deterministic. 

Using  Theorem  3.2.1,  we  can  directly  obtain  an  integral  flow  satisfying  (3.3)  and  thus 
solve  the  channel  routing  problem.  This  method  is  much  faster  than  the  original  method  of 
Raghavan  and  Thompson.  Our  method  does  have  a  somewhat  larger  constant  hidden  in  the 
big-0  of  equation  (3.3).  However,  by  changing  the  constant  in  the  granularity  condition,  we 
can  get  a  solution  with  a  much  smaller  constant  than  the  one  given  here,  although  not  as  small 
as  that  of  Raghavan  and  Thompson.  In  order  to  get  a  smaller  constant  factor  in  the  quality  of 
the  approximation,  the  running  time  of  the  algorithm  increases  by  a  constant  factor. 

3.3  Sparse  Cuts 

Another  application  of  our  concurrent  flow  algorithms  it  finding  sparse  cuts  in  graphs.  The 
computational  bottleneck  of  these  algorithms  is  solving  a  concurrent  flow  problem  and  its  linear 
programming  dual.  First,  we  summarize  the  previous  sparse  cut  approximation  results.  Then 
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we  show  our  concurrent  flow  algorithm  can  be  used  to  And  an  approximately  optimal  dual 
solution  to  the  corresponding  concurrent  flow  problems,  in  addition  to  finding  a  near  optimal 
flow.  Finally,  we  shall  give  even  faster  running  times  for  the  special  case  of  the  sparse  cut 
problem  where  the  input  graph  G  has  low  maximum  degree. 

3.3.1  Review  of  Previous  Results  on  Sparse  Cuts 

We  begin  by  motivating  the  need  for  finding  sparse  cuts.  One  good  method  for  solving  graph 
problems  is  to  use  a  divide-and-conquer  algorithm,  which  involves  dividing  the  grap^  =nto  two 
pieces,  solving  the  two  pieces  separately,  and  then  patching  the  results  back  together.  In  many 
graph  problems,  the  number  of  nodes  determines  the  difficulty  of  solving  a  problem,  and  the 
number  of  edges  going  between  the  two  pieces  of  the  graph  determines  the  cost  of  patching  the 
problem  together.  It  is  therefore  desirable  to  split  the  graph  into  two  roughly  equal  pieces,  such 
that  the  number  of  edges  going  between  the  two  sides  is  small.  Thus,  to  every  (bi)partition  of 
the  nodes  of  a  graph,  we  can  assign  a  value  that  measures  how  well  we  have  achieved  this  goal. 
While  there  are  many  ways  we  can  formulate  this  metric,  we  describe  one  that  is  particularly 
useful. 

Let  G  be  an  undirected  graph  with  capacities  u  on  its  edges.  For  a  subset  of  the  nodes  A, 
we  use  A  to  denote  the  complement  of  A.  The  associated  cut  is  the  set  of  edges  r(>l)  with  one 
endpoint  in  A  and  the  other  in  A.  Let  u(r{i4))  denote  the  sum  of  the  capacities  of  the  edges 
in  the  cut.  The  metric  we  use  is 


13  =  «(r(^))/(|4||i|). 

This  value,  0,  is  small  when  the  number  of  edges  crossing  the  cut  is  small  and  when  the  two 
sides  are  balanced.  Leighton  and  Rao  [43]  gave  an  0(log  n)-approximation  algorithm  for  the 
problem  of  minimizing  the  ratio  /3  =  u(r(i4))/(|y4||.4|)  over  all  cuts. 

Given  the  ability  to  find  such  cuts,  many  problems  have  been  solved  by  using  a  divide- 
and-conquer  approach  in  the  manner  described  in  the  first  paragraph  of  this  subsection.  In 
particular,  this  approach  has  yielded  the  first  polylog-times-optimal  approximation  algorithms 
for  a  wide  variety  of  NP-complete  graph  problems.  Leighton  and  Rao  [43]  showed  how  to  use 
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these  techniques  to  find  approximately  balanced  separators.  Combining  the  result  of  Leighton 
and  Rao  with  the  results  of  Bhatt  and  Leighton  [9],  we  obtain  algorithms  to  approximate  the 
minimum  cut  linear  arrangement,  minimum  area  layout  and  v/2-bifurcator8  of  a  graph.  They 
also  showed  how  to  approximate  the  minimum  feedback-arc  set.  Hansen  [27]  has  shown  how 
to  extend  these  results  to  approximate  some  graph  embedding  problems,  and  Makedon  and 
Tragoudas  [46]  have  extended  some  of  these  results  to  hypergraphs. 

Consider  the  concurrent  flow  problem  on  G  with  one  unit  of  demand  between  each  pair  of 
nodes.  The  optimum  value  A*  must  satisfy 

A*.u(r(>l))>d(>l,A)  =  |A||A|  (3.4) 

for  each  cut  r(>l),  where  d{A,  A)  denotes  the  sum  of  all  demands  across  the  cut.  Therefore,  the 
minimum  value  of  u(r(j4))/(|y4||y4()  over  all  cuts  r(i4)  gives  an  upper  bound  on  1/A*.  Leighton 
and  Rao  show  that  this  minimum  is  within  an  O(log  n)  factor  of  the  value  1  /A*.  Their  algorithm 
to  find  approximately  sparsest  cuts  makes  use  of  this  connection.  More  precisely  given  a  nearly 
optimal  length  function  (dual  variables)  they  show  how  to  find  a  partition  AuB  that  is  within 
a  factor  of  O(logn)  of  the  minimum  value  of  A,  and  hence  of  the  value  of  the  sparsest  cut. 

The  computational  bottleneck  of  the  Leighton  and  Rao  algorithm  is  computing  a  nearly  op¬ 
timal  A  and  the  corresponding  near-optimal  linear  programming  dual  solution  for  the  concurrent 
flow  problem  on  G  with  one  unit  of  demand  between  each  pair  of  nodes.  The  dual  solution  is  a 
non-negative  length  function  (  that  maximizes  the  ratio  u?)/(53»«,6£:  «(wtn)f(vu;)) 

(see  Theorem  2.2.2).  Linear  programming  duality  implies  that  this  maximum  is  equal  to  A*. 
Leighton  and  Rao  use  a  linear  programming  algorithm  to  find  the  length  function. 

A  natural  extension  is  the  problem  where  we  are  given  nonnegative  node  weights  i/(v)  for 
r  €  V  in  addition  to  the  capacities  on  the  edges.  For  a  subset  AT  of  V  let  i/{X)  denote  the 
sum  of  the  weights  on  the  nodes  in  X.  Consider  the  extension  of  the  sparsest  cut  problem  to 
minimizing  u(r(A))/(i/(A)i/(A))  over  all  cuts.  The  Leighton  and  Rao  algorithm  can  be  extended 
to  give  an  O(log  n)-approximation  algorithm  for  this  problem.  The  corresponding  concurrent 
flow  problem  has  demand  between  each  pair  of  nodes,  where  the  demand  d(s,  t)  between  nodes 
s  and  t  equals  i/{s)i/(t).  (If  the  weights  are  scaled  so  that  the  total  node-weight  is  n,  then  the 
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msun  change  to  the  Leighton  and  Rao  algorithm  is  to  select  the  node  s  for  starting  a  tree  with 
t/(s)  maximum.) 

Klein,  Agrawal,  Ravi,  and  Rao  [33]  extended  the  Leighton  and  Rao  results  to  the  case 
of  simple  concurrent  flow  problems  with  integral  capacities  and  arbitrary  integral  demands. 
For  a  source-sink  pair  (s,t),  let  d{s,t)  denote  the  corresponding  demand.  The  minimum  ratio 
cut  problem  is  to  minimize  the  ratio  u(r(i4))/d(>l,  A)  over  all  cuts.  By  inequality  (3.4),  the 
minimum  value  of  the  ratio  u(r(A))/d(y4,  A)  is  an  upper  bound  on  1/A*.  Klein,  Agrawal,  Ravi, 
and  Rao  [33]  proved  that  this  upper  bound  is  at  most  a  factor  of  O(log  nU  log  kD)  more  than 
1/A*  in  general,  and  they  gave  an  O (log  nU  log  kD)  approximation  algorithm  for  the  minimum 
cut  problem,  where  U  is  the  maximum  capacity  and  D  is  the  maximum  demand.  Tragoudas  [66] 
has  observed  that  their  algorithm  can  be  modifled  to  give  an  O(log n log  JbD)  factor  instead. 

Using  this  result,  Klein  et  al.  give  approximation  algorithms  for  chordalization  of  a  graph, 
register  sufficiency,  minimum  deletion  of  clauses  in  a  2CNF  =  formula,  via  minimization 
and  the  edge-deletion  graph  bipartization  problems.  Later  Ravi,  Agrawal  and  Klein  [53]  used 
these  techniques  to  give  approximation  algorithms  for  interval  graph  completion  and  a  single¬ 
processor  scheduling  algorithm.  Similar  to  the  LeIc,.'ton-Rao  algorithm,  the  computational 
bottleneck  of  their  algorithm  is  solving  the  dual  of  the  concurrent  flow  problem,  i.e.,  finding 
a  length  function  £  such  that  the  ratio  J2vw^e  ** 

maximum. 

3.3.2  Speeding  up  the  Unit-Capacity  Case 

The  computational  bottleneck  of  the  method  of  Leighton  and  Rao  is  solving  a  unit-capacity 
concurrent  flow  problem  in  which  there  is  a  demand  of  1  between  each  pair  of  nodes.  In  their 
paper,  they  appealed  to  the  fact  that  the  concurrent  flow  problem  can  be  formulated  as  a  linear 
program,  and  hence  can  be  solved  in  polynomial  time.  A  much  more  efficient  approach  is  to 
use  our  unit-capacity  approximation  algorithm.  The  number  of  commodities  required  is  0(ti*). 
Leighton  [40]  has  discovered  a  technique  to  reduce  the  number  of  commodities  required.  He 
shows  that  if  the  graph  in  which  there  is  an  edge  connecting  each  source-sink  pair  is  an  expander 
graph,  then  the  resulting  flow  problem  suffices  for  the  purpose  of  finding  an  approximately 
sparsest  cut.  (We  call  this  graph  the  demand  gmph.)  In  an  expander  we  have: 
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For  any  partition  of  the  node  set  into  A  and  B,  where  |i4|  <  |B|,  the  number  of 
commodities  crossing  the  associated  cut  is  9(|^|). 

Therefore,  for  this  smaller  flow  problem  A  =  n(|i4|/u(r(i4))).  Since  |B|  >  n/2,  it  follows  that 
nA  =  n(|i4||£|)/u(r(y4))).  The  smaller  flow  problem  essentially  “simulates”  the  ori^nal  all¬ 
pairs  problem.  Moreover,  Leighton  and  Rao’s  sparsest-cut  algorithm  can  start  with  the  length 
function  for  the  smaller  flow  problem  in  place  of  that  for  the  all-pairs  problem.  Thus  Leighton’s 
idea  allows  one  to  And  an  approximately  sparsest-cut  after  solving  a  much  smaller  concurrent 
flow  problem.  If  one  is  willing  to  tolerate  a  small  probability  of  error  in  the  approximation,  one 
can  use  0(n)  randomly  selected  source-sink  pairs  for  the  commodities.  It  is  well  known  how 
to  ramdomly  select  node  pairs  so  that,  with  high  probability,  the  resulting  demand  graph  is  an 
expander. 

By  Theorem  2.5.15,  algorithm  Unit  takes  expected  time  0{m?  log^  m)  to  find  an  appropriate 
solution  for  this  smaller  problem.  We  then  find  a  dual  solution  and  run  the  rest  of  the  algorithm 
of  Leighton-Rao.  The  dominant  step,  however,  is  the  solution  of  the  concurrent  flow  problem. 

Theorem  3.3.1  An  0(log  n)-factor  approximation  to  the  sparsest  cut  in  a  graph  can  be  found 
by  a  randomized  algorithm  in  O(m^log^  m)  time.  ■ 

3.3.3  Speeding  up  the  General  Case 
Finding  Good  Dual  Solutions 

The  algorithms  for  finding  sparse  cuts  in  node-weighted  edge-weighted  graphs  were  discovered 
by  Klein,  Agrawal,  Ravi  and  Rao  [33].  Similar  to  the  algorithm  of  Leighton  and  Rao  for 
the  unit  capacity  case,  they  first  approximately  solved  a  concurrent  flow  problem  and  then 
used  the  edge  lengths  (dual  variables)  to  guide  the  second  phase  of  their  algorithm.  The 
time  consuming  step  of  their  algorithm  is  to  solve  their  concurrent  flow  problem  and  find  the 
dual  variables.  They  relied  on  the  fact  that  a  concurrent  flow  problem  can  be  solved  via 
linear  programming.  In  this  section,  we  will  show  how  to  find  faster  solutions  using  algorithm 
ScalingConcurrent.  Unfortunately,  algorithm  ScalingConcurrent  returns  an  e-optimal 
solution  for  the  formulation  of  the  concurrent  flow  problem  in  which  optimality  is  measured  in 
terms  of  minimum-cost  flows.  In  other  words,  assume  for  a  moment  an  infinite  precision  model 
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of  computation.  Then,  by  Lemma  2.2.4,  the  flow  /  and  length  function  i  returned  by  algorithm 
ScalingConcurrent  on  an  instance  for  which  the  optimal  congestion  is  A*  are  such  that: 


i:?=,g:(A)  ^  A* 


(3.5) 


The  algorithm  of  Klein  et  al.  needs  a  length  function  £,  such  that  for  some  constant  c  >  0,  this 
function  satisfies 


R(i)  =  >  y  (3  g) 

In  order  to  do  so,  we  use  the  algorithm  ScalingConcurrent  to  find  a  length  function  1.  We 
show  that  with  respect  to  this  length  function,  we  do  not  necessarily  satisfy  (3.5)  but  we  can 
show  that 

c-iX) 


is  “close”  to  A*.  We  then  show  how  to  modify  this  length  function  so  that  it  satisfies  (3.6). 

Throughout  this  section  we  refer  to  the  complementary  slackness  conditions  for  the  minimum- 
cost  flow  problem.  These  were  given  in  Section  2.2,  but  we  restate  them  here.  Given  an  instance 
for  a  minimum-cost  flow  problem  M  and  a  feasible  flow  /<  then  /<  is  optimal  if  and  only  if  there 
exists  a  price  function  p  such  that 


Cj,{vw)  >  0  Vwtr  €  Ej, 


(3.7) 


where  £/,  is  the  set  of  edges  in  the  residual  graph  G/,. 

First,  we  consider  the  concurrent  flow  problem  that  directly  corresponds  to  the  given 
minimum-ratio  cut  problem.  We  combine  all  commodities  that  share  a  source  into  a  sin¬ 
gle  commodity  as  suggested  in  Lemma  2.2.1,  which  decreases  the  number  of  commodities  to 
k*  <  n.  We  shall  index  the  resulting  commodities  by  their  sources.  Given  an  error  target 
c,  if  our  concurrent  flow  algorithm  used  the  exact  length  function  £,  it  would  compute  a  flow 
satisfying  capacities  X-u{vw)  such  that 


Q  = 


E.c;(A)  ,  A- 
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But  we  actually  compute  flows  with  respect  to  an  approximate  length  function  i,  described 
in  the  proof  of  Lemma  2.4.27.  Let  Q  denote  the  corresponding  ratio  with  £  replaced  by  £  and 
C*  replaced  by  C,*.  First  we  show  that  Q  is  almost  as  close  to  A*  as  Q. 

Lemma  S.S.2  Let  /  be  the  flow  and  £  be  the  length  function  returned  by  algorithm  Scaling- 
Concurrent.  Then  Q  > 

Proof:  Let  7  =  c  •  /(^16mkU).  Recall  that  7  is  the  factor  that  approximately  relates  the 

real  lengths  to  the  approximate  lengths.  By  the  way  the  approximate  lengths  were  computed, 
y£(vw)  <  £(vw)  for  each  edge  vw  €  E.  Also,  by  arguments  similar  to  those  used  to  derive 
(2.22)  we  have 

C*  -  7C;  <  cA*/(8ib)  <  fA*$/(4jt). 

Using  these  two  facts,  we  obtain  the  following  bound  on  Q: 

7E.Q(A) 

Ev«,€£7^(w«>)u{uU>) 

7E,c;(A) 

E,  £ivw)u{vw)/4 

_ _ U 

E»«,€E^(  ««’)«(  vw) 

A*  fA* 

1  -i-f  “  "T 

A* 

1  -|-2(* 

The  last  inequality  follows  from  f 

Now,  we  describe  how  to  modify  this  length  function  to  produce  one  that  satisfies  inequality 
(3.6).  Setting  £  =■  £  does  not  necessary  work,  since  E.,»€V  d(s,t)dist/(s,t)  might  be  significantly 
smaller  than  Instead  of  using  £  directly,  we  compute  a  new  length  function  £. 

Let  A  be  the  congestion  of  the  flow  returned  by  algorithm  ScalingConcurrent.  For  each 
commodity  s,  we  compute  a  minimum-cost  flow  for  instance  M,  =  (G, u  •  X,£,d,),  that  is,  a 
flow  that  is  minimum  with  respect  to  costs  £  and  capacities  A  •  u(vw)  for  each  commodity.  We 
then  use  the  optimal  price  function  p,  (dual  variables  from  the  minimum-cost  flow)  to  adjust  £ 
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by  adding  to  it  the  sum  of  the  absolute  values  of  reduced  costs  for  edges  with  negative  reduced 
costs. 

Let  /*  be  the  minimum-cost  flow  for  instance  Ai,  =  (G,  u*  A,  1,  d,),  and  let  p,  be  the  optimal 
price  function  for  this  instance.  We  use  /,(vw)  to  denote  the  absolute  value  of  the  reduced  cost 
of  edge  vw  if  it  is  negative,  and  zero  otherwise,  i.e., 

=  -  min{0, /(»«;)  -|-  p,{v)  -  p,(in)}.  (3.8) 

The  complementary  slackness  conditions  for  the  minimum-cost  flow  problem  (3.7)  imply  that 
if  l,{vw)  >  0  then  f*{vw)  =  Xu(vw).  We  define  the  new  length  function  as  i{vw)  =  i(vw)  -f- 
i,(vw).  We  need  the  following  lemma  to  estimate  the  numerator  of  R{i). 

Lemma  3.3.3  Let  f’  be  the  minimum-cost  flow  for  instance  Mt  =  (G,ti  •  X.lyd,),  and  let  p, 
be  the  optimal  price  function  for  this  instance.  Then  /*  is  also  the  minimum-cost  flow  for  instance 
M',  -  (G,u  •  X,i  +  t*,d,).  Further,  the  cost  of  f*  in  instance  M',  is 

=  5^d(s,0dist<+^^(s,t). 

vwiE  t 


Proof :  We  prove  the  optimality  of  //  by  showing  that  f*  and  the  price  function  p,  satisfy  the 
complementary  slackness  conditions  (3.7)  for  instance  Af' .  By  the  definition  of  C,  we  have  that 
the  reduced  cost  of  edge  vw,  l{vw)  -I-  l,{vw)  -I-  p,{v)  —  p,{w),  is  nonnegative  and  it  is  positive 
if  and  only  if  i{vw)  -f-  p,(i’)  -  p,{w)  is  positive.  By  applying  the  complementary  slackness 
conditions  to  cost  i,  flow  //  and  prices  p,,  we  see  that  if  this  value  is  positive,  then  /,*  is  zero, 
and  therefore,  //  is  minimum-cost  for  instance  Af'. 

Now  consider  the  cost  of  /*  subject  to  the  cost  function  t  +  i,.  There  are  no  edges  with 
negative  reduced  cost,  and  therefore  the  cost  of  the  flow  is  at  least  d{s,t){p,{t)  ~  p,{a)).  All 
edges  that  carry  flow  have  zero  reduced  cost,  which  implies  that  the  cost  of  the  flow  is  equal  to 
£,d(s,t)(p,(t)  -  p,{s))  and  p,(t)  -?.(«)  =  dist/+,^(s,t).  ■ 

(1+20' 


Theorem  3.3.4  R(t)  > 
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Proof :  We  shall  estimate  the  numerator  of  R(i)  using  Lemma  3.3.3.  For  a  source  s  we  have 
that 


^d(3,t)disti(s,t)  > 


I 

+  i,(viv))/;(viv) 

vw 

c:w+Ef.(vw)/;(vit). 

VW 


By  the  complementary  slackness  conditions,  and  the  definition  of  in  (3.8)  we  find  that  if 
/,(vw)  ^  0,  then  f*{vw)  =  Xu{vw).  Summing  over  all  sources  yields 


5^  d{s,t)disti{s,t)  >  53c;(A)+ 

<,<€V  B  vui  « 

Dividing  the  two  sides  of  this  inequality  by  i(vtr)tt(vti?)  we  have 

R(i)  >  r.  g;(\)  Zb  ,3  g. 

~  Evw  «(vu;)i( vu;)  +  Ylvw  E,  ^.(wtr) ’ 

Applying  the  fact  that  for  positive  a,  6,1  and  A,  if  a/6  <  A  then  (a  +  Ax)/(6+  i)  >  a/6,  we  see 
that  the  left  side  of  inequality  (3.9)  is  at  least  Q  which  by  Lemma  3.3.2  is  at  least  A*/(l  +  2c). 


Thus  we  have  shown  the  following: 

Corollary  3.3.5  An  c-optimal  flow  and  length  function  pair  (/,/)  produced  by  our  concurrent 
flow  algorithm  can  be  translated  into  a  length  function  £  needed  by  the  minimum-ratio  cut  algorithms 
in  O(k*nmlog(n*/m)log(nf/))  time.  The  dual  objective  value  associated  with  £  is  within  a  l-0(€) 
factor  of  the  optimum. 

Proof:  The  algorithm  needs  to  perform  k*  minimum-cost  flows.  The  time  for  a  minimum-cost 
flow  comes  from  Lemma  2.4.29.  ■ 

We  can  use  the  approximate  minimum-cost  flow  computation  in  Lemma  2.4.30  instead  of 
Lemma  2.4.29.  With  an  argument  similar  to  the  above,  but  somewhat  more  involved,  we  replace 
the  log(n*/m)  in  the  theorem  by  a  loglog(nC/).  We  obtain  the  following  corollary. 
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Corollary  S.3.6  An  O(logn)-approximation  to  the  node  weighted  cut  problem  with  general 
capacities  can  be  found  in  0(n’m  log  nt/ log*  n  min  {log(n’/m),  log  log  nf/})  expected  time.  An 
0(lognlogJbl?)  -approximation  to  the  minimum-ratio  cut  problem  with  general  demands  and  ca¬ 
pacities  can  be  found  in  0(jt*nmlogn(/logfclognmin{log(n®/m),loglognC^})  expected  time. 

An  analogous  theorem  can  be  obtained  for  finding  approximately  sparsest  cuts  in  hypergraphs 
using  the  concurrent  flow  algorithm  in  conjunction  with  the  approximation  algorithm  of  Make- 
don  and  Tragoudas  [46]. 

3.3.4  A  Faster  Algorithm  for  Low  Degree  Graphs 

In  this  section,  we  improve  the  running  time  given  in  Corollary  3.3.6  for  low-degree  graphs  G. 
The  new  running  time  depends  on  A,  the  maximum  degree  of  any  node  in  the  graph  G. 

We  consider  the  following  minimum-ratio  cut  problem  for  graphs  with  unit  demands, 
Problem  MRl:  Given  an  instance  I  where  each  commodity  i  has  =  1  and  the  graph  that 
has  an  edge  between  the  source  and  sink  of  each  commodity  is  a  constant  degree  expander  on 
V.  (We  call  this  graph  the  demand  graph.) 

While  Problem  MRl  may  seem  like  an  obscure  special  case,  it  is  in  fact  an  important  one. 
The  Leighton  and  Rao  [43]  algorithm  uses  the  solution  of  a  concurrent  flow  problem  in  which 
the  demand  graph  is  the  complete  graph.  As  mentioned  in  Section  3.3.1,  one  can  modify  the 
Leighton  and  Rao  algorithm  to  use  the  solution  to  this  new  concurrent  flow  problem  and  its 
dual  problem  to  derive  an  O(logn)  approximation  to  the  minimum-ratio  u(r(A))/(|A|li41)  over 
all  cuts.  To  get  an  idea  how  the  two  problems  are  related  consider  a  cut  r(A)  and  assume  that 
|A|  <  |A|.  Since  the  demand  graph  is  a  constant  degree  expander,  c|A|  <  d{A,A)  <  c\A\  for 
some  constants  c  and  c.  Therefore, 

«(r(A))  ^  /«(r(A))\ 

d(A,A)  V  Ml  /' 

But  since  by  assumption,  Ml  ^  Ml’  know  that  n/2  <  Ml  ^  Therefore,  tt(r(A))/d(A,  A) 
is  0(n)  times  more  than  u(r( A))/(MIMI)- 

The  first  step  in  solving  problem  MRl  is  to  round  all  edge  capacities  up  to  integer  multiples 
of  a  parameter  p  in  such  a  way  that  the  ratio  u(r(A))/(MIMI)  changed  by  more  than 
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a  factor  of  two.  Notice  that  ir(i4)|  <  A|A|.  We  shall  use  r  to  denote  the  maximum  of 
|r(A)|/d(A,  A)  over  all  cuts  r(A).  Notice  that  r  <  A/c,  where  c  is  the  expansion  parameter  of 
the  demand  graph. 

Theorem  3.S.7  Let  A*  be  the  optimum  value  of  the  concurrent  flow  problem  MRl,  and  let 
^  <  (rA*)"*.  If  we  round  each  capacity  u(e)  up  to  tk(c),  the  next  integer  multiple  of  /i,  then 
the  minimum  value  of  ti(r(A))/(|A||A|)  over  all  cuts  is  at  most  twice  of  the  minimum  value  of  of 
ti(r(A))/(|A||A|)  over  all  cuts  r(A). 

Proof:  For  all  cuts  r(A),  it  must  be  that  A*tt(r(A))  >  d(A,  A).  The  rounding  error  u(r(A))  — 
u(r(A))  is  at  most  /i|r(A)|  <  |r(A)|(2rA*)~*  <  d(A,  A)|/A*  <  u(r(A)).  Thus  for  each  cut 
ti(r(A)l/(|A||Al)  <  2u(r(A))/(|A||A|),  i.e.,  the  new  ratio  is  at  most  twice  the  old  ratio.  ■ 

Rounding  to  integer  multiples  of  /i  preserves  the  minimum-ratio  cut  up  to  a  factor  of  2.  If 
we  want  to  preserve  A*  up  to  a  constant  factor  we  must  perform  a  somewhat  finer  rounding. 

Theorem  3.3.8  Let  A*  be  the  optimum  value  of  the  concurrent  flow  problem  MRl  and  let 
H  <  €(20rA*logml/logn)"'.  If  we  round  each  capacity  u(e)  up  to  u(c),  the  next  integer  multiple 
of  fi,  then  the  minimum  congestion  A*  subject  to  capacities  u  is  at  most  A*/(l  -f  e),  where  A*  is 
the  minimum  congestion  subject  to  capacities  u. 

Proof:  The  idea  is  to  use  the  O(log  nlog/:Z?)  approximation  result  of  Klein,  Agrawal,  Ravi, 
and  Rao  [33]  as  improved  by  Tragoudas  [66].  Klein  et  al.  show  that  the  minimum  value  over 
all  cuts  u{r(A))/<i(A,  A)  is  within  an  O(lognlogfcD)  factor  of  the  value  of  1/A*.  Consider 
the  following  auxiliary  concurrent  flow  problem.  The  graph  is  G  with  capacities  u.  For  each 
edge  vw  £  E  there  is  a  demand  of  value  d(v,  ir)  =  u{vw)  —  u{vw)  from  v  to  w.  Observe 
that  the  demands  in  the  auxiliary  problem  are  integral  and  at  most  /x,  and  log/x  is  at  most 
log(cmf//f‘>Orlog  nU  logn))  <  21og(mt/).  Using  the  same  estimates  as  in  the  proof  of  Theorem 
3.3.7  we  can  conclude  that  the  minimum  of  u(r(A))/d(A,  A)  over  all  cuts  r(A)  is  at  most 
f/(20 log mf/ logn).  By  the  approximation  result  of  Klein  et  al.  the  minimum  congestion  A* 
for  this  problem  is  at  most  c.  That  is,  the  added  capacities  can  be  routed  in  an  c-fraction  of 
the  original  capacities  u. 

Now  consider  an  optimal  flow  /  of  congestion  A*  in  the  rounded  problem.  To  get  a  solution 
in  the  original  problem,  we  route  the  part  of  flow  /  that  uses  the  added  capacity  in  the  way 
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this  demand  is  routed  in  the  optimal  solution  to  the  auxiliary  problem.  The  additional  flow 
does  not  increase  the  congestion  by  more  than  a  factor  of  1  +  c.  B 

Next  consider  the  question  of  how  long  it  takes  to  solve  a  rounded  concurrent  flow  problem. 
For  simplicity  we  shall  restrict  our  attention  to  the  case  when  c  is  a  constant.  The  number  of 
commodities  is  0(n).  The  capacities  in  the  minimum-cost  flow  problem  are  integer  multiples 
of  X/i.  We  shall  use  algorithm  ScalingConcurrent  with  a  suitable  choice  of  minimum-cost 
flow  routine.  We  shall  use  the  minimum-cost  flow  algorithm  due  to  Ford  and  Fulkerson  [29]  and 
Yakovleva  [71],  that  repeatedly  augments  the  flow  along  the  shortest  path  in  the  residual  graph, 
to  solve  these  problems.  Given  a  concurrent  flow  with  congestion  A,  the  number  of  shortest 
path  computations  in  a  minimum-cost  flow  subroutine  is  at  most  the  demand  divided  by  the 
unit  of  capacity,  rounded  up,  that  is,  the  number  of  minimum-cost  flow  computations  is  at  most 

We  use  these  ideas  to  solve  the  minimum-ratio  cut  and  the  concurrent  flow  problem.  The 
0((A"V”*  +  1)(^  +  «logn))  time  required  for  solving  the  minimum-cost  flow  problem  might 
not  dominate  the  0(m  log  n)  needed  to  compute  the  approximate  length  function.  To  simplify 
the  bounds  we  shall  count  each  minimum-cost  flow  computation  as  0((A~'/i“*  -I-  l)mlogn)) 
time.  These  bounds  can  be  further  improved  by  using  the  data  structures  described  in  Section 
2.5.5,  but  we  do  not  pursue  that  here. 

The  running  time  that  we  wish  to  achieve  is  greater  that  the  time  it  takes  to  And  an  initial 
flow  using  the  k  maximum-flow  computations  suggested  in  Lemma  2.4.2.  The  capacities  of  this 
problem  are  not  rounded,  therefore  we  have  to  use  a  general  maxinaum-flow  algorithm.  All  such 
algorithms  take,  up  to  logarithmic  factors,  fl(mn)  time.  An  initial  flow  that  is  optimal  up  to  a 
factor  of  0(km)  can  be  computed  in  O(kTn)  time  by  routing  each  commodity  on  the  path  with 
maximum  bottleneck  capacity  from  its  source  to  its  sink. 

An  iteration  of  the  algorithm  uses  the  rounding  described  Theorem  3.3.7  with  =  c(AAo)“‘. 
We  terminate  the  iteration  if  A  decreases  below  Ao/2.  At  that  point  we  divide  Aq  by  2,  and 
start  the  next  iteration.  We  use  the  flow  obtained  in  the  previous  iteration  as  our  initial  flow. 

Theorem  3.3.9  An  O(logn)-approximation  to  the  minimum  ratio  u(r(A))/([A|(A{)  over  all  cuts 
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r(v4)  in  a  graph  with  capacities  u  and  maximum  degree  A  can  be  computed  in  O(nmAlog^n) 
expected  time. 

Proof:  By  Theorem  2.4.20,  we  need  to  perform  0(fc  log  n  log  k)  minimum-cost  flow  problems, after 
initialization.  One  iteration  takes  0(A“‘^“*mlogn)  =  0((cA  -f-  l)mlogn)  time,  which  yields 
the  result.  ■ 

The  proof  of  the  following  theorem  is  the  same  as  the  proof  of  the  previous  theorem 
with  the  rounding  from  Theorem  3.3.7  replaced  by  that  of  Theorem  3.3.8  and  with  pi  = 
cc(20AAologmf/logn)“^). 

Theorem  3.3.10  For  any  constant  (,  an  c-optimal  solution  to  a  unit  demand  concurrent  flow 
problem  in  a  graph  with  maximum  degree  A  and  with  a  constant  degree  expander  demand-graph 
can  be  computed  in  (9(nmAlog'*  nlognf/)  expected  time. 

In  regular  graphs  nA  =  m,  and  therefore  the  running  times  of  the  above  two  algorithms  for 
problem  MRl  are,  up  to  polylogarithmic  factors,  O(m^). 


Chapter  4 


Implementing  Multicommodity  Flow 
Algorithms* 


4.1  Introduction 

In  this  chapter  we  describe  an  implementation  of  algorithm  ScalingConcurrent.  In  Section 

4.2  we  will  discuss  some  of  the  previous  implementations  of  multicommodity  flow  algorithms. 
In  Section  4.3  we  discuss  some  of  the  decisions  we  made.  In  Section  4.4,  we  analyze  the  running 
time  of  our  implementation  and  make  some  comparisons  to  a  linear  programming  algorithm. 

4.2  Previous  Results 

All  previous  implementations  of  algorithms  for  the  concurrent  flow  problem  with  general  ca¬ 
pacities  that  we  are  aware  of  rely  on  linear  programming.  An  instance  M  of  the  concurrent 
flow  problem  can  be  expressed  as  the  following  linear  program: 

’This  chapter  describes  joint  work  with  Tishya  Leong  and  Peter  Shor  [44]. 
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minimize  A 
subject  to 

fi(wv)-  fi(vw)  =  0,  for  every  node  t>  ^ 

wve£  vw^E 

i= 

fiivw)  =  di,  for  t;  =  Si,i  = 

vw^E 

Y,  fiiwv)  =  di,  for  v  =  tj,i= 

wviE 

k 

EM  vw)  <  A  •  u{vw),  Vruj  €  E; 

«=i 

fi{vw)  >  0,  Vvto  €  £,*=  (4.1) 

This  linear  program  has  0{mk)  variables  and  0(nk  +  m)  constraints.  Even  for  a  graph  with 
average  vertex  degree  A,  there  are  0(Ank  +  mk)  =  0{mk)  non-zero  entries  in  the  constraint 
matrix.  Thus  the  size  of  the  linear  program  is  fairly  large  compared  with  the  size  of  the  input. 
In  particular,  both  the  number  of  constraints  and  the  number  of  variables  grows  linearly  in  k. 
The  large  size  of  the  linear  programs  makes  the  general  simplex  algorithm  impractical  for  all 
but  very  small  problems.  Some  algorithms  that  take  advantage  of  the  special  structure  of  mul¬ 
ticommodity  flow  problems  have  been  proposed.  These  algorithms  fall  into  three  main  classes: 
price-directive  decomposition,  resource-directive  decomposition,  and  partitioning  approaches. 
More  recent  approaches  include  interior-point  methods  [1]  and  a  combinatorial  scaling  algorithm 
[54].  All  of  the  aforementioned  algorithms  solve  multicommodity  flow  problems  using  one  of 
two  different  objective  functions.  Some  find  a  minimum-cost  multicommodity  flow,  while  others 
find  a  flow  that  maximizes  the  total  amount  of  flow  in  the  network.  A  detailed  description  of 
these  approaches  requires  a  knowledge  of  linear  programming  that  is  beyond  the  scope  of  this 
thesis.  We  refer  the  reader  to  the  surveys  of  Assad  [5]  and  Kennington  [31]  and  the  thesis  of 
Schneur  [54]  for  more  information  on  these  approaches. 

We  are  aware  of  two  implementations  of  algorithms  for  the  unit-capacity  unit-demand  con¬ 
current  flow  problem.  Shahrokhi  and  Matula  [59]  report  encouraging  results  for  an  implemen- 
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tation  of  their  algorithm.  Klein,  Kang  and  Borger  [34]  have  implemented  a  variant  of  the 
algorithm  of  Klein,  Stein  and  Tardos  [36]  which  is  essentially  the  algorithm  ScalingUnit.  Ini¬ 
tial  comparisons  to  the  algorithm  of  Klein  et  al.  show  that,  as  expected,  our  algorithm  performs 
fewer  iterations,  but  take  more  time  to  perform  each  iteration. 


4.3  An  Implementation 

We  now  describe  how  we  have  adapted  and  implemented  Algorithm  ScalingConcurrent.  We 
made  some  modifications  to  the  algorithm  for  the  purpose  of  improving  actual  perfoi  mance. 
We  describe  the  changes  we  have  made  and  the  motivations  behind  them.  We  also  point  out 
areas  in  which  our  modifications  could  be  fine-tuned  with  further  research.  First  we  will  focus 
on  a  few  of  the  more  interesting  and  important  aspects  of  our  implementation. 

4.3.1  Grouping  Commodities 

The  grouping  of  commodities  is  suggested  in  Lemma  2.2.1.  We  place  all  commodities  with 
the  same  source  into  one  commodity  group  and  run  the  algorithm  on  the  commodity  groups 
instead  of  on  the  individual  commodities.  Grouping  has  two  advantages.  First,  the  running 
time,  which  varies  linearly  with  k,  now  depends  on  the  number  of  commodity  groups  rather 
than  on  the  number  of  commodities.  For  problems  with  large  numbers  of  commodities,  the 
favorable  dependence  on  k  means  a  significant  reduction  in  running  time.  Second,  because  our 
algorithm  uses  0{km)  space,  commodity  grouping  also  reduces  the  space  requirement  by  up  to 
a  factor  of  n.  In  practice,  this  advantage  probably  outweighs  the  previous  one.  We  have  been 
able  to  solve  problems  through  the  use  of  grouping  that  were  not  solvable  without  grouping, 
due  to  the  memory  limitations  of  the  particular  machine.  We  also  note  that  the  minimum-cost 
flow  code  that  we  used  is  written  to  handle  multiple  sources  and  sinks,  so  grouping  does  not 
create  any  added  complexity.  The  advantages  gained  by  grouping  commodities  have  also  been 
documented  by  Schneur  [54]. 
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4.3.2  Choosing  a  Commodity  to  Reroute 

Recall  that  algorithm  ScalingCon CURRENT  us^  either  a  deterministic  strategy  or  a  ran¬ 
domized  strategy  for  choosing  a  commodity  group  (or  a  commodity)  to  reroute.  Let  fi  be 
the  current  flow  of  commodity  group  t  and  let  /*  be  the  minimum-cost  flow  for  problem 
M  =  (G,u  •  Then  the  deterministic  method,  described  in  Lemma  2.4.11,  com¬ 
putes  the  cost  Ci  =  l/i(vt«)|^(vti;)  of  a  commodity  group  t,  its  minimum  cost  C*  = 

J2vw€E  the  diflerence  Ci  —  C*  between  its  cost  and  the  niinimum  cost.  The 

commodity  group  to  be  rerouted  is  the  f  *st  <-bad  one  found  in  a  predetermined  ordering,  in 
other  words,  the  first  one  which  has  a  difference  C,  —  C,*  greater  than  cC,-  +  (cA$)/jb*.  This 
method  requires  k*  minimum-cost  flow  computations  per  iteration  in  the  worst  case.  The  ran¬ 
domized  strategy  computes  the  cost  C,  of  each  commodity  group  t  and  randomly  chooses  a 
commodity  group  with  probability  proportional  to  cost.  This  method  uses  an  expected  e~^ 
minimum-cost  flow  computations  per  iteration.  Once  every  k*  iterations,  minimum-cost  flows 
are  computed  for  all  the  commodity  groups,  and  the  congestion  A  is  checked  against  the  lower 

a  • 

bound  to  decide  if  the  algorithm  should  terminate.  This  check  increases  the  num¬ 

ber  of  minimum-cost  flow  computations  by  at  most  a  factor  of  2.  Our  selection  strategy  draws 
from  both  the  deterministic  and  the  randomized  methods  and  from  the  termination  check. 

To  make  the  most  progress  per  iteration,  we  attempt  to  find  not  only  a  poorly  routed 
commodity  group  but  the  most  poorly  routed  commodity  group.  We  may  designate  as  the 
most  poorly  routed  commodity  group  either  the  group  with  the  highe'^  cost  Ci  or  the  group 
with  the  largest  difference  C,  —  C,*  between  cost  and  minimum  cost.  Using  either  measure  and 
rerouting  larger  fractions  of  flow  than  the  a  in  Lemma  2.4.8  we  have  found  that  an  algorithm 
that  deterministically  reroutes  the  most  poorly  routed  commodity  group  sometimes  gets  stuck 
rerouting  a  single  group  over  and  over  with  no  decrease  in  the  congestion.  We  have  also  found 
that  when  it  does  not  get  stuck,  such  a  deterministic  algorithm  usually  progresses  faster  than 
a  randomized  algorithm.  We  therefore  use  a  partly  deterministic,  partly  randomized  selection 
strategy  in  which  we  alternate  between  k*  /2  iterations  of  deterministic  selection  and  k' /2 
iterations  of  random  selection.  By  taking  advantage  of  the  minimum-cost  flow  computations 
performed  in  the  termination  check  every  k*  iterations,  we  can  select  commodity  groups  to 
reroute  without  computing  extra  minimum-cost  flows.  We  reroute,  in  decreasing  order,  the 
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k*/2  groups  with  the  greatest  difference  between  cost  and  minimum-cost  followed  by  it*/2 
randomly  chosen  commodity  groups.  To  prevent  domination  by  a  limited  number  of  groups, 
the  random  selection  weights  all  commodity  groups  equally  as  proposed  by  Goldberg  [20]  and 
Grigoriadis  and  Khachiyan  [26].  Note  the  savings  over  the  theory  here.  We  compute  the  costs 
once  and  then  perform  several  reroutings.  By  the  time  that  we  actually  reroute  a  commodity,  it 
might  be  the  case  that  that  commodity  is  no  longer  e-bad.  The  savings  gained  by  not  having  to 
recompute  costs  at  each  iteration  more  than  compensates  for  the  extra  reroutings  performed. 

4.3.3  Implementing  the  Minimum-cost  Flow 

Once  the  algorithm  has  chosen  a  commodity  group  to  reroute,  it  must  find  an  appropriate 
minimum-cost  flow.  For  this  purpose,  we  use  the  relaxt-III  minimum-cost  flow  code  of  Bert- 
sekas  and  Tseng  [8].  One  drawback  of  the  routine  we  have  chosen  is  that  it  requires  integer 
capacities,  costs,  and  demands,  making  preprocessing  and  postprocessing  necessary  each  time 
it  is  called.  Another  routine  might  better  suit  our  algorithm,  but  we  concentrate  mainly  on  the 
number  of  iterations  of  our  algorithm  and  treat  the  minimum-cost  flow  routine  as  a  black  box. 

For  the  costs  used  to  calculate  the  minimum-cost  flow,  we  use  a  slightly  different  length 
function  from  that  proposed  in  Algorithm  Decongest.  Instead  of  setting  the  length  t{vw) 
of  each  edge  vw  €  E  equal  to  e“*^’'“'Vu(t7tt!).  we  use  a  length  function  in  which  i{vw)  = 
j^go(A(t-tt)-A)+ej  ^  c  is  a  scaling  constant  that  depends  on  the  largest  integer  the  system  can 

handle.  (Note  the  similarity  to  the  techniques  used  in  Section  2.4.2.)  We  include  the  terms 
—A  and  c  because  we  want  to  extract  real  flows  from  a  routine  that  works  only  with  integers. 
These  terms  spread  the  lengths  over  the  range  of  viable  non-negative  integers,  giving  us  the 
most  accurate  minimum-cost  flow  we  can  procure.  We  have  removed  the  u{vw)  factor  so  that 
edges  with  equally  high  congestion  have  equally  high  cost  in  the  minimum-cost  flow.  We  have 
found  through  limited  experimentation  that  this  strategy  produces  minimum-cost  flows  which 
better  suit  our  algorithm. 

4.3.4  Choosing  Constants  and  Rerouting 

As  is  evident  from  the  analysis  of  ScalingConcurrent  in  Chapter  2,  the  constant  a  and  the 
fraction  a  of  flow  rerouted  greatly  affect  the  running  times  of  the  algorithm.  The  values  used 
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in  algorithm  DECONGEST  are  very  large  for  a  and  very  small  for  a.  The  constant  a  can  easily 
exceed  1000,  and  <7  can  easily  fall  below  10~‘.  For  the  algorithm  to  progress  at  a  reasonable 
rate  in  practice,  given  the  fixed  precision  of  computers,  we  need  to  use  smaller  values  for  a 
and  larger  values  for  <7.  As  the  algorithm  progresses,  however,  we  need  to  use  a  larger  a  and 
a  smaller  <7  (see  the  description  of  Decongest  for  details).  We  control  the  rate  of  growth  of 
a  and  a  by  means  of  a  scaling  factor  a.  We  set  a  equal  to  d  *  s/A,  where  d  is  the  constant 
c  —  log  m.  Here  c  is  chosen  to  be  the  largest  value  x  such  that  m  •  c*,  the  largest  possible  value 
of  the  potential  function,  does  not  cause  overflow.  One  key  to  making  progress  is  to  decide 
when  to  decrease  a. 

To  decide  how  much  flow  to  reroute,  we  sample  the  values  that  the  potential  function  $ 
would  take  after  rerouting  various  fractions  of  flow.  We  do  not  need  to  restrict  ourselves  to  a 
value  of  a  that  guarantees  improvement  in  every  iteration,  we  only  need  to  choose  a  value  that 
guarantees  us  improvement  in  that  particular  iteration,  thereby  allowing  for  the  possibility  of 
rerouting  much  larger  fractions  of  flow  than  in  procedure  Reduce.  In  fact,  we  can  try  to  choose 
the  best  possible  value  for  <7,  i.e.,  the  one  that  gives  the  greatest  reduction  in  ♦.  We  can  find 
a  efficiently  because  $  is  a  concave  function. 

More  precisely,  we  take  advantage  of  the  following; 

Lemma  4.3.1  Let  /  be  a  flow  and  /,*  be  the  minimum-cost  flow  computed  by  procedure  de¬ 
congest.  Let  $(<7)  be  the  value  of  the  potential  function  after  rerouting  a  a  fraction  of  the  flow 
from  /j  onto  /*.  Then  $(0)  is  a  concave  function  with  respect  to  a. 

Proof:  We  wiU  prove  the  lemma  for  the  potential  function  given  in  Chapter  2.  We  shall  use 
the  notation  exp(i)  to  denote  c'.  Recall  that 


^  u{vw)({vw) 

VW^E 
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Thus  after  rerouting  <7  units  of  flow 


♦(<^)  =  exp 

vwiE 


l/j («*»)!  j  +  1(1  -  <')/.(»«;)  +  a/;*(t;u;)|j  j  . 


(4.2) 


Observe  that  for  each  edge  vw,  the  quantity  exp  (Sj^i  l//(®®'  )|)),  which  we  denote 

by  Q{vw),  is  independent  of  a  and  is  always  positive.  We  use  S{vw)  to  denote  the  sign  of 
(1  -  a)fi{vw)  +  cfi(vw),  i.e.,  S(vw)  =  1  if  that  quantity  is  positive  and  —1  otherwise.  Wr  can 
now  rewrite  equation  (4.2)  as 


♦(^)=  m  C(w«’)expf^^^^((l -<T)/<(ru;)  +  o/;*(wti>))V  (4.3) 

vut(E  '  V  /  / 

We  can  now  take  the  first  derivative  of  #(0)  with  respect  to  <t, 

(^?(««')  exp  (^1^  ((1  -  ^u{vw)  /<’(*'“’)))  * 

and  the  second  derivative 

5^  ^Q(vu^)exp^^||^((l-<7)/i(vti;)  +  <r/;*(wu;))^  (“/•(’'”’)  + •^*(’'“^))*)  • 

(4.4) 

Now  observe  that  the  multiplicands  in  (4.4)  are  ail  positive,  and  hence  the  second  derivative  is 
always  positive  and  the  function  is  concave.  Note  that  it  is  also  true  for  the  $  that  is  used  in 
practice  as  this  $  can  be  written  is  the  $  here  with  each  term  multiplied  by  a  positive  constant. 


Since  $  is  concave,  we  know  that  it  has  at  most  one  local  minimum.  Thus  if  we  search 
for  the  minimum  by  sampling  5  equally  spaced  point,  we  know  that  we  can  always  eliminate 
1/4  of  the  possible  values  at  each  iteration.  Thus  we  can  efficiently  find  the  minimum,  or  at 
least  a  point  close  to  the  minimum.  We  sample  fractions  to  the  precision  .001 /s’,  and  we  also 
use  this  value  as  a  floor  Omin  on  the  fraction  of  flow  that  can  be  rerouted.  To  avoid  wasting 
time  rerouting  small  amounts  of  flow,  we  reroute  a  commodity  only  if  a  is  at  least  as  large  as 
We  know  that  we  may  have  to  reroute  fractions  as  small  as  0{(/aX),  and  so  we  must 
decrease  Omjn  faster  than  we  increase  a  to  lower  the  minimum  value  for  a.  We  begin  with  s 
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equal  to  .25  and  raise  it  by  .25  whenever  the  maximum  fraction  rerouted  in  k*  iterations  is  less 
than  <Tmin/(s  •  *:*)  or  whenever  the  ratio  of  the  congestion  A  to  its  lower  bound 
increases  after  k*  iterations.  We  have  found  that  this  strategy  works  well  in  most  instances  but 
scales  a  too  fast  in  a  few  instances,  slowing  the  algorithm  too  much  for  practical  use.  In  such 
cases,  we  rerun  the  algorithm,  scaling  a  more  slowly.  We  have  not  yet  discovered  the  optimal 
rate  at  which  we  should  scale  a,  nor  have  we  discovered  exactly  when  we  should  scale  it.  This 
is  the  area  in  which  our  algorithm  would  benefit  most  from  further  research.  Other  areas  in 
which  it  could  be  further  improved  include  the  selection  strategy  for  commodities  to  reroute 
and  the  technique  for  choosing 

4.4  Experimental  Results 

We  have  tested  our  algorithm  on  a  variety  of  problems  and  compared  its  performance  to 
the  theoretical  bounds.  We  used  two  different  random  network  generators,  netgen[37]  and 
RMFGen[23].  RMFGEN  generates  graphs  that  have  a  set  of  square  planes  with  connections  be¬ 
tween  adjacent  planes.  When  we  refer  to  a  graph  generated  by  NETGEN,  we  will  indicate  the 
number  of  planes.  When  run  on  random  NETGEN  and  RMFGEN  graphs  with  randomly  placed 
commodities,  our  algorithm  behaved  more  or  less  as  expected.  It  took  polynomially  in  e"’ 
more  time  to  get  closer  to  the  optimal  solution  and  less  than  linearly  in  k*  more  time  to  handle 
larger  numbers  of  commodities.  Furthermore,  for  large  numbers  of  commodities,  our  algorithm 
outperformed  the  linear  programming- based  code  of  Kennington.  Our  algorithm  performed 
poorly  on  one  real  problem  provided  by  the  GTE  Corporation,  but  as  we  explain  in  Section 
4.4.5  we  consider  this  an  anomaly  arising  from  a  limited  number  of  unusually  time-consuming 
minimum-cost  flow  computations.  This  one  instance  aside,  we  find  our  results  encouraging 
and  consider  it  an  improvement,  in  many  cases,  over  the  simplex-based  algorithms  that  have 
preceded  it. 

4.4.1  Dependence  on  the  Error  Parameter 

The  theory  predicts  an  inverse  polynomial  dependence  of  the  running  time  on  the  error  param¬ 
eter  (.  More  precisely,  it  states  that  the  number  of  minimum-cost  flow  computations  depends 
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on  e~’.  Since  our  algorithm  computes  a  constant  number  of  minimum-cost  flows  per  itera¬ 
tion,  the  number  of  iterations  should  also  depend  on  Equivalently,  e  should  depend  on 
1/ V#  of  iterations. 

We  ran  our  algorithm  on  various  problems  and  graphed  the  lowest  e  achieved  against  the 
number  of  iterations  completed.  Each  run  stopped  at  a  final  c  of  .001  or  less.  To  compress  the 
data,  we  used  data  points  representing  ranges  of  iterations.  For  each  problem,  we  considered 
10  runs  and,  for  each  run,  the  minimum  e  achieved  at  each  termination  check.  The  aggregate 
c  for  a  range  equaled  the  average  of  the  minimum  c  values  found  at  the  termination  checks 
falling  in  the  range  during  each  of  the  10  runs.  We  examined  a  problem  with  20  commodities 
and  four  problems  with  10  commodities  using  different  netgen  graphs  with  50  nodes  and  100 
edges.  We  also  examined  two  problems  with  10  and  20  commodities,  respectively,  using  an 
RMPGEN  graph  with  140  edges  and  48  nodes  (spread  evenly  over  12  square  planes).  To  test  a 
large  problem,  we  examined  a  single  run  on  a  large  rmpgen  problem  with  700  commodities, 
2075  edges,  and  500  nodes  (spread  over  20  square  frames).  For  aU  these  problems,  we  graphed  e 
versus  the  number  of  iterations.  We  also  graphed  the  function  1  /  V#  of  iterations  on  which  we 
expected  e  to  depend.  As  is  evident  from  Figures  4.5  through  4.11,  our  implementation  always 
performed  better  than  the  expected  bounds.  Some  inconclusive  attempts  at  fitting  the  data  to 
a  curve  of  the  form  a  «  (#  of  iterations)^  +  c  using  a  regression  package  lends  some  additional 
support  to  this  conclusion  as  typical  values  of  b  were  between  -.5  and  -1. 

4.4.2  Dependence  on  the  Number  of  Commodities 

With  respect  to  the  number  of  commodities  k,  our  algorithm  also  seems  to  conform  to  the 
theoretical  bounds.  Using  10  runs  for  each  data  point,  we  graphed  the  average  number  of 
iterations  needed  to  solve  problems  with  variable  numbers  of  commodities  given  a  fixed  graph. 
In  Figure  4.12,  we  examined  four  netgen  graphs  with  50  nodes  and  100  edges  and  values  of 
k  between  10  and  70.  In  Figure  4.13,  we  traced  the  same  values  of  k  using  an  RMFGEN  graph 
with  140  edges  and  48  nodes  (spread  over  12  square  planes).  In  Figure  4.14,  using  values  of 
k  between  50  and  250,  we  examined  an  rmfgen  graph  with  752  edges  and  192  nodes  (spread 
over  12  square  planes).  Graphing  the  number  of  iterations  agsunst  the  number  of  commodity 
groups  k*  <  k,  we  observed  that  the  number  of  iterations  either  grew  linearly  or  grew  linearly 
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to  a  peak  and  then  dropped.  The  drops  may  result  from  larger  numbers  of  commodities  making 
it  possible  to  route  commodities  over  a  smaller  number  of  paths  and  over  shorter  paths.  In 
trying  to  find  flows  that  give  the  edges  equal  congestion,  the  algorithm  has  more  commodities 
at  its  disposal  to  congest  each  edge.  In  any  case,  the  number  of  iterations  grows  no  more  than 
linearly  with  the  number  of  commodity  groups  and  therefore  no  more  than  linearly  with  the 
number  of  commodities. 

4.4.3  Comparison  to  Other  Algorithms 

Because  the  running  time  of  our  algorithm  grows  no  more  than  linearly  with  the  number 
of  commodities,  it  can  effectively  solve  large  concurrent  flow  problems.  To  the  best  of  our 
knowledge,  our  implementation  is  the  first  for  an  algorithm  that  finds  an  (-optimal  solution 
to  the  general  concurrent  flow  problem.  Consequently,  comparisons  to  existing  algorithms 
inherently  contain  some  amount  of  bias.  We  have  nevertheless  compared  our  algorithm  to 
another  as  best  we  could.  The  fact  that  our  algorithm  runs  faster  than  another  on  a  particular 
problem  instance  does  not  necessarily  mean  our  algorithm  is  faster  in  general.  The  comparison 
reveals  sufficiently  consistent  trends,  however,  that  enable  us  to  draw  some  general  conclusions. 

We  begin  with  a  brief  discussion  of  the  algorithm  to  which  we  have  compared  our  algorithm. 
The  algorithm  is  mcnf85,  a  special  purpose  simplex  code  for  multicommodity  flow  problems 
written  by  Kennington  [32].  We  chose  it  for  two  reasons.  First,  we  had  access  to  the  code 
on  our  machine.  Second,  and  more  importantly,  previous  tests  by  Adler,  Karmarkar,  Resende 
and  Veiga  [1]  demonstrate  its  efficiency.  Adler  et  al.  compared  three  different  codes  for  multi- 
commodity  flow:  MINOS  5.0,  which  is  an  advanced  implementation  of  the  simplex  method  [47], 
MCNf85,  and  their  own  interior  point  method.  Their  experiments  show  that  the  running  time 
of  MINOS  grows  much  faster  than  that  of  the  other  two  algorithms  and  that,  for  the  problems 
they  tested,  mcnf85  and  the  interior  point  algorithm  have  comparable  running  times.  Thus 
we  concluded  that  mcnf85  was  one  of  the  best  codes  available  at  that  time.  Several  people, 
however,  have  pointed  us  towards  codes,  particularly  interior  point  codes,  that  may  possibly 
be  better  than  Mcnf85  on  this  class  of  problems.  We  are  in  the  process  of  comparing  our 
algorithm  against  these  other  codes  and  will  report  the  results  when  they  become  available. 

We  faced  two  obstacles  in  comparing  our  ailgorithm  to  MCNF85.  First,  our  algorithm  finds 
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an  approximate  solution  while  mcnfSS  finds  an  exact  solution.  Since  we  did  not  possess  the 
programming  skills  needed  to  modify  MCNr85  to  alleviate  this  problem,  we  raji  our  algorithm 
to  both  (  =  .01  and  e  =  .001  before  comparing  it  to  mcnf85.  The  second  difficulty  in  making 
the  comparison  is  that  the  algorithms  are  designed  for  different  objective  functions.  By  using 
an  objective  function  of  0  for  mcnf85  and  a  cost  of  0  on  each  edge,  we  can  treat  it  as  an 
algorithm  that  determines  whether  a  feasible  multicommodity  flow  exists.  We  could  then  call 
this  algorithm  O(log(ne~*))  times  to  find  an  c-optimal  solution  to  a  concurrent  flow  problem, 
but  to  do  so  seems  too  far  from  the  original  purpose  of  the  algorithm  for  fair  comparison. 
Instead,  we  ran  our  algorithm  to  find  the  maximum  z  for  which  there  exists  a  feasible  flow 
satisfying  a  percentage  z  of  each  demand.  We  then  scaled  the  demands  by  z  to  get  a  problem 
that  we  knew  to  be  feasible.  This  problem  corresponds  to  the  problem  that  mcnf85  would 
have  to  solve  in  the  last  iteration  of  the  binary  search  procedure  defined  above.  We  compared 
a  run  of  our  algorithm  to  a  run  of  mcnf85  with  the  input  modified  as  described  above.  We 
could  better  evaluate  our  algorithm  by  comparing  it  to  other  approximation  codes  for  the  same 
problem.  As  mentioned  above,  however,  we  could  not  make  such  a  comparison  because  we  do 
not  know  of  any  such  codes. 

4.4.4  The  Results 

The  results  of  our  experiments  appear  in  Figure  4.1.  The  experiments  in  this  table  were 
performed  on  a  Silicon  Graphics  4D/340S.  They  show  that  as  the  number  of  commodities 
increases,  the  running  time  of  mcnf85  grows  much  more  rapidly  than  the  running  time  of  our 
algorithm  for  graphs  of  all  sizes.  The  difference  does  not  arise  simply  because  we  group  the 
commodities  (they  could  incorporate  grouping  in  their  algorithm  too).  Hardly  any  grouping 
occurred  in  the  graphs  with  500  nodes  and  70  or  less  commodities,  and  the  running  time  of 
our  algorithm  stiU  grew  much  more  slowly  than  the  time  for  mcnf85.  In  fact,  as  discussed 
above,  the  running  time  of  our  algorithm  grows  slower  than  k  while  rough  analysis  of  the  data 
shows  that  the  time  for  mcnf85  grows  at  least  as  fast  as  k^.  We  show  two  examples  graphically 
in  Figure  4.2  and  4.3.  Since  the  size  of  the  linear  program  grows  by  Jfc®,  this  growth  is  not 
particularly  surprising. 

Our  algorithm  will  be  able  to  solve  large  and  previously  unsolvable  multicommodity  flow 
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Problem  Specification 

Kennington 

Our  algorithm 

nodes 

edges 

commodities 

generator 

c=  .01 

c=i  .001 

50 

100 

20 

NG 

49 

20 

103 

50 

100 

50 

NG 

397 

35 

43 

50 

100 

70 

NG 

857 

29 

33 

48 

140 

10 

RMF 

8 

13 

13 

48 

140 

20 

RMF 

24 

23 

23 

48 

140 

30 

RMF 

69 

18 

35 

48 

140 

40 

RMF 

122 

25 

38 

48 

140 

50 

RMF 

216 

21 

71 

48 

140 

60 

RMF 

•316 

40 

61 

48 

140 

70 

RMF 

470 

45 

62 

500 

2075 

10 

RMF 

87 

831 

5230 

500 

2075 

20 

RMF 

608 

1484 

2641 

500 

2075 

30 

RMF 

1831 

2625 

3881 

500 

2075 

40 

RMF 

6571 

3762 

6084 

500 

2075 

50 

RMF 

15601 

4710 

7401 

500 

2075 

60 

RMF 

18449 

3819 

6201 

500 

2075 

70 

RMF 

34362 

4435 

8258 

500 

2075 

700 

RMF 

22411 

192 

748 

50 

RMF 

2702 

240 

589 

192 

748 

250 

RMF 

85754 

637 

1571 

49 

260 

585 

none 

1373 

2472 

(estimate) 

Figure  4.1:  Running  time  comparison  of  our  algorithm  and  Kennington’s  algorithm.  Running 
times  are  in  seconds  on  a  Silicon  Graphics  machine.  NG  is  NETGEN  and  generator  RMF  is 
RMFGEN.  The  last  problem  is  the  problem  defined  in  Section  4.4.5 
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SO  nod*.  200  odgo,  RMFQEN  graph 


MCNF85  o  us,  •ptilon  -  .001  *  US,  aptllon  -  .01 


Figure  4.2:  A  comparison  between  our  algorithm  and  mcnf85.  This  problem  has  50  nodes 
and  200  edges. 


500  nod*,  2075  *dg*  RMFQEN  graph 


•-^“MCNFSS  '  o  ■  y,  •psllon  -  .001  u«.  •ptilon  -  .01 


Figure  4.3:  A  comparison  between  our  algorithm  and  mcnf85.  This  problem  has  500  nodes 
and  2075  edges. 
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Problem  Specification 

f 

%  of  time  finding 

nodes 

edges 

commodities 

generator 

Min-cost  flows 

50 

100 

20 

NG 

.001 

49.9 

50 

100 

50 

NG 

.001 

50.5 

50 

100 

70 

NG 

.001 

43.6 

48 

140 

30 

RMF 

.001 

44.1 

48 

140 

40 

RMF 

.001 

44.1 

48 

140 

50 

RMF 

.001 

42.4 

48 

140 

60 

RMF 

.001 

44.7 

48 

140 

70 

RMF 

.001 

47.7 

500 

2075 

10 

RMF 

.01 

.«> 

500 

2075 

30 

RMF 

.01 

79.7 

500 

2075 

40 

RMF 

.01 

73.3 

500 

2075 

50 

RMF 

.01 

76.8 

500 

2075 

60 

RMF 

.01 

80.7 

500 

2075 

70 

RMF 

.01 

77.7 

192 

752 

50 

RMF 

.001 

55.8 

192 

752 

250 

RMF 

.001 

59.0 

49 

260 

585 

none 

& 

99.8 

Figure  4.4:  Percentage  of  Time  that  our  algorithm  spent  performing  minimum-cost  flows.  The 
data  is  gotten  from  the  UNIX  profiling  routine  prof.  NG  is  NETGEN  and  generator  RMF  is 
RMFGEN.  The  last  problem  is  the  problem  defined  in  Section  4.4.5. 

problems.  We  have  already  shown  that  we  can  solve  a  700  commodity  problem  faster  than 
mcnf85  can  solve  a  70  commodity  problem.  For  large  graphs  with  small  numbers  of  com¬ 
modities,  our  algorithm  is  slower  than  mcnf85.  The  rapid  growth  rate  of  mcnf85,  however, 
with  respect  to  the  number  of  commodities  makes  our  algorithm  more  desirable  for  problems 
with  more  than  a  few  commodities.  As  discussed  in  Chapter  3,  one  of  the  motivations  for  this 
work  comes  from  multicommodity  flow  problems  that  arise  in  approximating  various  NP-hard 
problems.  (See  [43], [33], [35],  and  [42]  for  details.)  These  problems  have  large  numbers  of  com¬ 
modities,  i.e.,  at  least  as  many  commodities  as  the  number  of  nodes.  Our  algorithm  provides  a 
practical  means  for  solving  such  problems. 


4.4.5  An  Anomaly 
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In  one  case,  a  problem  with  49  nodes,  260  edges,  and  585  commodities  using  actual  data 
from  GTE,  our  algorithm  performed  much  more  poorly  than  the  linear  programming  algorithm. 
Though  our  algorithm  ran  for  only  3745  iterations,  a  reasonable  number,  those  iterations  took  a 
total  of  18.4  hours  of  CPU  time.  We  attribute  this  anomaly  to  inefficiency  in  the  minimum-cost 
flow  routine  since  minimum-cost  flow  computations  accounted  for  over  99.8%  of  the  running 
time.  The  theory  shows  that  minimum-cost  flow  computations  dominate  the  running  time  of 
the  algorithm,  but  even  for  the  much  larger  rmfgen  graph  with  500  nodes  and  1025  edges, 
minimum-cost  flow  computations  generally  took  less  than  80%  of  the  time.  For  small  graphs, 
they  generally  took  between  40  and  50  percent  of  the  time.  See  Figure  4.4  for  a  more  detailed 
description  of  the  times.  The  time  spent  solving  the  GTE  probler  was  not  equahy  divided 
between  iterations.  Iterations  including  the  termination  check  asid« ,  most  iterations  took  less 
than  100  milliseconds.  Some  iterations,  however,  took  hu  •  ’reds  of  seconds,  up  to  1000  times 
the  normal  duration. 

With  the  help  of  several  other  researche.j,  we  have  verified  that  these  are  problems  on 
which  RELAXT-ili  takes  an  inordinately  long  amount  of  time.  Several  people  have  run  these 
problems  on  their  codes  and  observed  no  anomalous  behavior,  i.e.,  the  running  times  for  this 
set  of  problems  are  all  aj  '-oylmately  the  same.  In  order  to  estimate  a  more  realistic  running 
time  for  t.  ^  problem,  we  •  ipute  an  upper  bound  on  the  what  the  running  time  would  have 
if  we  were  using  the  RNET  code  of  Grigoriadis.  Joseph  Cheriyan  [10]  has  reported  that  on  a 
representative  sample  of  these  minimum-cost  flow  problems,  the  running  time  of  RNET  on  a 
..)PARC2  (which  i.**  slower  than  our  machine)  never  exceeds  0.66  seconds.  Using  the  estimate 
th  t  50%  (see  Figure  4.4)  of  the  time  is  spent  in  the  minimum-cost  flow  computations,  we  arrive 
at  a  figure  of  2472  seconds  as  a  “reasonable”  upper  bound  on  the  running  time  of  this  instance. 


4.5  Conclusions  and  Open  Problems 

Our  algorithm  performs  as  well  as,  and  often  better  than,  the  theoretical  bounds.  The  theory 
predicts  the  number  of  iterations  of  the  algorithm  to  be  0{e~^k).  Our  experiments  show  that 
the  number  of  iterations  often  grows  slower  as  a  function  of  f .  Our  experiments  also  show  that 
for  small  k,  the  number  of  iterations  does  increase  linearly  with  k.  As  k  approaches  the  number 
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of  nodes,  however,  the  number  of  iterations  grows  at  most  linearly  and  sometimes  actually 
decreases. 

On  the  problems  we  tested,  the  running  time  of  our  algorithm  grew  much  slower  as  a 
function  of  k  than  that  of  Kennington’s  algorithm,  thereby  implying  that  our  algorithm  is 
preferable  to  one  of  the  best  network  simplex  based  approaches  for  problems  with  large  numbers 
of  commodities. 

The  performance  of  our  algorithm  was  heavily  influenced  by  our  choice  of  when  to  scale  a. 
We  tested  several  strate^es  and  found  that  different  strategies  performed  better  for  different 
problems.  We  therefore  believe  that  more  work  is  needed  to  find  a  strategy  that  works  well  for 
all  problems. 

Our  algorithm  might  be  improved  by  using  a  different  minimum-cost  flow  algorithm.  In  fact, 
we  do  not  require  the  exact  solution  to  a  minimum-cost  flow  but  only  an  approximate  solution. 
An  algorithm  that  is  able  to  find  fast  approximations  to  a  minimum-cost  flow  might  significantly 
improve  the  running  time  of  our  algorithm.  Also,  the  minimum-cost  flow  problems  we  solve  for 
the  same  commodity  might  have  similar  solutions.  Using  the  solution  to  the  previous  problem 
as  a  starting  point  for  the  new  problem  might  improve  the  running  time. 

We  are  aware  of  two  other  implementations  of  combinatorial  algorithms  to  which  we  should 
compare  our  algorithm.  The  first,  by  Shahrokhi  and  Matula  [59],  works  only  for  graphs  in 
which  every  capacity  and  demand  is  1,  but  it  would  still  be  interesting  to  see  how  our  algorithm 
compares  to  theirs  on  this  class  of  graphs.  The  second,  by  Schneur  [54],  also  works  by  gradually 
rerouting  flow.  She  has  shown  that  her  algorithm  runs  well  on  many  problems.  We  would  like 
to  compare  the  algorithms  on  the  same  machine  and  the  same  problems. 

We  conclude  by  mentioning  a  valuable  lesson  learned  about  accuracy.  In  the  theoretical 
results  of  Chapter  2,  a  good  deal  of  technical  effort  was  used  to  convert  the  results  from  a 
model  of  computation  in  which  infinite  precision  is  used  to  a  RAM  model  of  computation.  It  is 
tempting  to  view  this  work  as  being  of  purely  theoretical  interest.  After  all,  modern  computers 
can  perform  arithmetic  operations  on  real  numbers  almost  as  quickly  as  they  can  perform 
arithmetic  operations  on  integers.  The  single  biggest  difficulty  in  implementing  this  algorithm, 
however,  was  dealing  with  the  finite  precision  of  the  computer.  The  exponents  of  the  length 
functions  that  we  wished  to  compute  were  typically  too  big  for  the  computer.  Many  decisions 
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had  to  be  made  about  how  to  scale  and  round  these  numbers.  In  making  these  decisions  the 
theoretical  work  on  adapting  the  algorithm  for  the  RAM  model  of  computation  was  very  useful. 
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,  0  <*  y  <=  0.08 


x-axis  is  #  of  iterations. 

j/-axis  is  f.  _ 

The  top  curve  is  l/v^#  of  iterations. 

The  bottom  curve  is  the  minimum  e  achieved. 


Figure  4.6:  netgen  graph  with  50  nodes,  100  edges,  and  20  commodities 
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x-axis  is  #  of  iterations, 
y-axis  is  c. 

The  top  curve  is  1/v/#  ol' iterations. 

The  bottom  curve  is  the  minimum  e  achieved. 


Figure  4.6;  netgen  graph  with  50  nodes,  100  edges,  and  10  commodities. 
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ngkl0_4  0  <*  X  <=  400. 0  <»  y  <»  0.1 


z-axis  is  #  of  iterations, 
y-axis  is 

The  top  curve  is  l/\/#  of  iterations. 

The  bottom  curve  is  the  minimum  c  achieved. 


Figure  4.8:  netgen  graph  with  50  nodes,  100  edges,  and  10  commodities. 


122  CHAPTER  4.  IMPLEMENTING  MULTICOMMODITY  FLOW  ALGORITHMS 


z-axis  is  #  of  iterations, 
y-axis  is  f. 

The  top  curve  is  of  iterations. 

The  bottom  curve  is  the  minimum  <  achieved. 


Figure  4.9:  RMFGEN  graph  with  48  nodes,  140  edges,  and  10  commodities. 
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150  <*  X  <=  300, 0  <»  y  <= 


x-axis  is  #  of  iterations, 
y-axis  is  t. 

The  top  curve  is  1/v/#  of  iterations. 

The  bottom  curve  is  the  minimum  f  achieved. 

Figure  4.10:  rmfgen  graph  with  48  nodes,  140  edges,  and  20  commodities. 
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i-axis  is  #  of  iterations. 

y-axis  is  c.  _ 

The  top  curve  is  1/ v'#  of  Iterations. 

The  bottom  curve  is  the  minimum  c  achieved. 

Figure  4.11;  rmfgen  graph  with  500  nodes,  2075  edges,  and  700  commodities 
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x-axis  is  #  of  commodity  groups. 
y-axls  is  #  of  iterations. 

Each  curve  represents  a  set  of  runs  on  one  of  four  different  underlying  graphs. 


Figure  4.12;  netgen  graphs  with  50  nodes,  100  edges,  and  from  10  through  70  commodities. 
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z-axis  is  #  of  commodity  groups, 
y-axis  is  #  of  iterations. 

The  curve  represents  a  set  of  runs  on  one  underlying  graph. 

Figure  4.13:  rmfgen  graphs  with  48  nodes,  140  edges,  and  from  10  through  70  commodities. 
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z-axis  is  #  of  commodity  groups, 
y-axis  is  #  of  iterations. 

The  curve  represents  a  set  of  runs  on  one  underlying  graph. 


Figure  4.14:  RMFGEN  graphs  with  192  nodes,  740 edges,  and  from  50  through  250  commodities. 


Chapter  5 


Approximation  Algorithms  for  Shop 
Scheduling* 


5.1  Introduction 

Shop  scheduling  refers  to  a  large  class  of  problems  that  typically  arise  in  a  shop,  factory  or 
assembly  line  setting.  The  shop  has  m  machines,  and  in  the  basic  environment  each  machine 
is  different  and  performs  a  different  function.  Each  job  consists  of  a  set  of  operations,  each  of 
which  must  be  processed  on  a  particular  machine;  a  job  may  have  more  than  one  operation 
on  a  particular  machine.  We  wish  to  produce  a  schedule  that  assigns  a  period  of  time  to  each 
operation  during  which  it  is  processed  on  the  appropriate  machine.  The  goal  is  to  minimize 
the  completion  time  of  the  last  operation  to  complete,  while  ensuring  that  no  more  than  one 
operation  is  assigned  to  a  machine  at  any  point  in  time  and  no  two  operations  of  the  same  job 
are  scheduled  simultaneously. 

A  variety  of  constraints  may  be  introduced  on  the  order  of  execution  of  the  operations  of  the 
job,  and  different  sorts  of  constraints  yield  different  well-known  versions  of  the  problem.  (We 
focus  only  on  order  constraints  between  the  operations  of  each  job,  and  not  between  operations 
of  different  jobs.)  For  example,  if  we  impose  a  strict  total  order  on  the  order  of  execution  of 
the  operations  of  a  job,  the  problem  is  a  job  shop  scheduling  problem.  If  the  total  order  is  the 

*Thia  chapter  describes  joint  work  with  David  Shmoys  and  Joel  Wein  [60], 
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same  total  order  for  every  job,  and  each  job  has  at  most  one  operation  on  each  machine,  we 
have  a  flow  shop  scheduling  problem.  If  there  is  no  order  at  all  imposed  on  the  execution  of  any 
job’s  operations,  we  have  an  open  shop  problem.  It  is  traditional  in  the  scheduling  literature 
to  focus,  for  the  open  shop  problem,  on  the  case  when  each  job  is  processed  on  each  machine 
at  most  once  (since  operations  on  the  same  machine  can  be  coalesced).  We  refer  to  the  general 
shop  scheduling  problem  that  does  not  fall  into  one  of  the  three  above  categories  as  the  dag 
shop  problem. 

In  this  thesis  we  concentrate  primarily  on  the  job  shop  scheduling  problem,  for  two  reasons. 
First  of  all,  most  of  our  results  for  other  shop  problems  can  be  obtained  as  easy  corollaries  of 
our  results  for  the  job  shop  problem.  Second,  the  job  shop  problem  is  probably  the  most  famous 
and  most  difficult  of  all  the  versions  of  the  problem.  It  is  strongly  A/^'P-hard,  and  moreover, 
except  for  the  cases  when  there  are  two  jobs  or  when  there  are  two  machines  and  each  job  has 
at  most  two  operations,  essentially  all  special  cases  of  this  problem  are  A/^P-hard,  and  typically 
strongly  A^P-hard  [19,  39].  For  example,  it  is  A^P-hard  even  if  there  are  3  machines,  3  jobs  and 
each  operation  is  of  unit  length;  in  this  case  we  can  think  of  the  input  length  as  the  maximum 
number  of  operations  in  a  job,  p. 

In  addition  to  this  theoretical  evidence  of  the  difficulty  of  the  job  shop  problem,  it  is  also 
one  of  the  most  notoriously  difficult  A^P-hard  optimization  problems  in  terms  of  practical 
computation,  even  with  very  small  instances  being  difficult  to  solve  exactly.  A  striking  example 
of  this  difficulty  is  that  a  single  instance  of  the  problem  involving  only  10  jobs,  10  machines 
and  100  operations,  which  first  appeared  in  a  book  by  Muth  and  Thompson  in  1963,  remained 
unsolved  for  23  years  despite  repeated  attempts  to  find  an  optimal  solution  [39].  Today,  due 
to  better  algorithms  and  faster  machines,  instances  with  10  jobs  and  10  machines  seem  to  be 
tractable.  Applegate  and  Cook  solved  ten  different  10  x  10  problems,  including  the  notorious 
instance  mentioned  above,  in  times  ranging  from  90  seconds  to  42  minutes.  (It  is  interesting  to 
note  that  the  instance  of  Muth  and  Thompson  was  one  of  the  easier  instances  to  solve  using 
their  technique).  Slightly  larger  instances,  however,  are  still  currently  intractable;  they  report 
instances  of  size  10  x  15,  15  x  20,  15  x  15  and  10  x  20  that  they  were  unable  to  solve  [4]. 
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Formal  Definition  and  Previous  Results 

We  formally  define  the  job  shop  problem  as  foUows.  We  are  given  a  set  =  {mi,  m3, . . . ,  m„} 
of  machines,  a  set  J  =  {^i,..-,*^n}  of  jobs,  and  a  set  =  {0,j|*  =  l,...,/i,,j  =  l,...,n} 
of  operations,  where  Kij  indexes  the  machine  on  which  operation  Oij  runs.  Thus  m  is  the 
number  of  machines,  n  is  the  number  of  jobs,  Hj  is  the  number  of  operations  of  job  Jj ,  and 
H  —  maxj  /ij.  Oij  is  the  ith  operation  of  Jj;  it  requires  processing  time  on  a  given  machine 
mt  €  Af ,  where  k  =  Ki^-,  for  an  uninterrupted  period  of  a  given  length  pij.  (In  other  words,  this 
is  a  non-pTeemptixx  model.  A  model  in  which  operations  may  be  interrunted  and  resumed  at  a 
later  time  is  called  a  preemptive  model.)  Each  machine  can  process  at  most  one  operation  at  a 
time,  and  each  job  may  be  processed  by  at  most  one  machine  at  a  time.  If  the  completion  time 
of  operation  Oij  is  denoted  by  Cij,  then  the  objective  is  to  produce  a  schedule  that  minimizes 
the  maximum-completion  time,  Cm»x  =  maXij  C,,;  the  optimal  value  is  denoted  by 

It  is  possible  to  extend  this  model  by  associating  with  each  job  Jj  a  release  date  rj,  on  which 
Jj  becomes  available  for  processing.  A  theorem  of  Shmoys,  Wein  and  Williamson  [61]  shows 
that  the  length  of  the  optimal  schedule  is  no  more  than  twice  the  length  of  the  optimal  schedule 
for  the  corresponding  problem  without  release  dates.  All  our  results  thus  apply  to  this  model, 
with  the  corresponding  bounds  multiplied  by  2. 

The  formal  definition  of  the  flow,  open  or  dag  shop  problems  are  almost  the  same,  except 
for  the  foUowing  small  differences: 

•  /low  shop:  Kij  =  Kij',  for  all  and  ^  for  all 

•  open  shop:  The  Oij  can  be  processed  in  any  order. 

•  dag  shop:  For  each  job  j  we  define  a  partial  order  on  the  Oij  and  require  that  they  be 
processed  in  any  total  order  consistent  with  that  partial  order. 

There  are  two  fundamental  lower  bounds  on  the  length  of  an  optimum  schedule.  Since  each 
job  must  be  processed,  must  be  at  least  the  maximum  total  length  of  any  job,  max/^  23.  P*;. 
which  we  shall  call  the  maximum  job  length  of  the  instance,  Pmax*  Furthermore,  each  machine 
must  process  all  of  its  operations,  and  so  must  be  at  least  max„u  which  we 

call  the  maximum  machine  load  of  the  instance,  !!„»*•  These  lower  bounds  apply  regardless  of 
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whether  we  have  a  job,  flow,  open  or  dag  shop  problem. 

There  has  been  a  tremendous  aunount  of  literature  on  shop  scheduling  problems  over  the 
last  thirty  years  [39].  We  mentioned  earlier  that  all  but  the  most  restrictive  versions  of  the  job 
shop  problem  are  AfV-hasd,  as  are  the  other  versions  of  the  problem.  When  there  are  at  least 
3  machines  both  the  open  and  flow  shop  problems  are  .VP-hard  [39].  When  there  are  just  two 
machines  both  these  problems  are  known  to  be  in  P  [28,  24].  In  contrast,  the  two-machine 
job  shop  problem  is  only  known  to  be  polynomial-time  solvable  if  eaudi  job  has  at  most  two 
operations,  or  if  each  operation  has  unit  size  [39]. 

Despite  all  the  attention,  however,  surprisingly  little  has  been  known  about  approxima¬ 
tion  algorithms  for  shop  scheduling  problems.  In  fact,  all  that  was  known  was  the  following 
observation  by  Gonzales  and  Sahni: 

Theorem  5.1.1  [25]  An  algorithm  .4  for  the  job  shop  problem  that  produces  a  schedule  in  which 

at  least  one  machine  is  running  at  any  point  in  time  is  an  m-approximation  algorithm. 

Proof :  The  length  of  the  schedule  produced  by  such  an  algorithm  Cm*x(»4)  is  bounded  above 
by  JZijPiji  since  some  operation  is  always  being  executed.  On  the  other  hand,  the  average 
machine  load,  >s  a  lower  bound  on  the  maximum  machine  load,  which  is  a  lower 

bound.  The  theorem  follows  directly.  ■ 

Little  was  also  known  in  the  way  of  negative  results,  results  that  indicate  it  is  difficult 
to  approximate  these  problems.  Recently,  however,  Williamson,  Hall,  Hoogeveen,  Hurkens, 
Lenstra,  and  Shmoys  [70],  extending  work  by  Williamson  [69],  have  shown  that  unless  P  =  VP, 
none  of  these  problems  can  be  approximated  arbitrarUy  closely. 

Theorem  5.1.2  [70]  Unless  P  =  VP,  there  is  no  polynomial-time  algorithm  that  approximates 
any  of  the  job  shop,  flow  shop  or  open  shop  problems  within  a  factor  of  less  than  |.  ■ 

Despite  the  lack  of  knowledge  about  approximation  algorithms  with  good  worst-case  relative 
error  guarantees,  there  are  two  relevant  results  that  are  important  to  our  work.  The  most 
interesting  approximation  algorithms  to  date  for  job  shop  scheduling  have  appeared  primarily 
in  the  Soviet  literature  and  are  based  on  a  beautiful  connection  to  geometric  arguments.  This 
approach  was  independently  discovered  by  Belov  and  Stolin  [7]  and  by  Sevast'yaJiov  [56]  as  well 
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as  by  Fiala  [15].  This  approach  typicaUy  produces  schedules  for  which  the  length  can  be  bounded 
l>y  IlmuE  +  gi'n,  ti)Pmmx,  where  is  a  polynomial,  and  =  max^^  Pij  is  the  maximum 
operation  length.  For  the  job  shop  problem,  Sevast’yanov  [57,  58]  gave  a  polynomial-time 
algorithm  that  delivers  a  schedule  of  length  at  most  The  bounds  obtained 

in  this  way  do  not  give  good  worst-case  relative  error  bounds.  Even  for  the  special  case  of  the 
flow  shop  problem,  the  best  algorithms  to  date  delivered  solutions  of  length  ftfmCl..). 

Since  these  results  are  not  well  known  in  the  West,  yet  are  important  to<ds  for  ns,  we  provide 
here  a  bit  of  information  about  the  proof  of  the  flow  shop  result,  which  is  simpler  than  the  more 
general  job  shop  result.  This  simpler  presentation  of  the  proof  is  due  to  David  Shmoys  [62]. 

Theorem  5.1.3  There  exists  a  polynomial  time  algorithm  A  for  the  flow  shop  problem  that  yields 
a  schedule  of  length  bounded  above  by  +  m(m  —  Ijpmax- 

Proof: 

The  proof  relies  heavily  on  the  following  lemma. 

Lemma  5.1.4  Let  {vj,  V2, . . .,  v„}  be  a  set  of  d-dimensional  vectors  such  that  =  0- 

There  exists  a  polynomial-time  algorithm  that  computes  a  permutation  rr  such  that  for  any  k  = 
111^=1  ^'0)11  ^  dmaX;  ||Vjj|,  where  we  use  ||x||  to  denote  the  Lj-norm  of  x. 

Without  loss  of  generality,  we  can  assume  that  the  load  on  each  machine  is  equal  to  the 
maximum  machine  load,  namely  Ilniuc  In  this  case  the  completion  time  of  the  schedule  is 
Umax  +  f  1  where  I  is  the  amount  of  idle  time  on  the  last  machine  before  it  starts  processing  the 
last  operation  of  the  last  job  to  complete  on  it.  If  we  choose  a  permutation  r  of  the  n  jobs  and 
schedule  their  operations  in  that  order  on  every  machine,  the  condition  - 

(m  -  l)pmax  yields  an  upper  bound  of  m(m  -  l)pm«x  on  I. 

Now  if  we  construct  a  set  of  n  m-dimensional  vectors  Vj,  where  Vj  =  {pij  —  P2j,Pi}  — 
Pzj,..  .,Pm-i,)  -  Pmj)i  the  algorithm  mentioned  in  the  previous  lemma  produces  the  necessary 
permutation.  ■ 

Another  important  result  on  shop  scheduling  comes,  somewhat  surprisingly,  from  the  litera¬ 
ture  on  packet  routing.  Leighton,  Maggs  and  Rao  [41]  have  proposed  the  following  model  for  the 
routing  of  packets  in  a  network:  And  paths  for  the  packets,  and  then  schedule  the  transmission 
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of  the  packets  along  these  paths  so  that  no  two  packets  traverse  the  same  edge  simultaneously. 
The  primary  objective  is  to  minimize  the  time  by  which  all  packets  have  been  delivered  to  their 
destination. 

The  scheduling  problem  considered  by  Leighton,  Maggs  and  Rao  is  simply  the  job  shop 
scheduling  problem  with  each  processing  time  pij  =  1.  They  also  added  the  restriction  that 
each  path  does  not  traverse  any  edge  more  than  once,  or  in  scheduling  terminology,  each  job 
has  at  most  one  operation  on  each  machine.  This  restriction  of  the  job  shop  problem  remains 
(strongly)  MV-haid  [39].  The  main  result  of  Leighton,  Maggs  and  Rao  was  tu  show  ipat  for  their 
special  case  of  the  job  shop  problem,  there  always  exists  a  schedule  of  length  0(nmu  +  Pm»x)- 
Unfortunately,  their  result  is  not  algorithnaic,  as  it  relies  on  a  nonconstructive  probabilistic 
argument  based  on  the  Lovasz  Local  Lemma.  They  also  obtained  a  randomized  algorithm  that 
delivers  a  schedule  of  length  0(nma)e  +  Pmmx  log  n),  with  high  probability. 

We  can  now  state  our  main  theorem. 

Theorem  5.1.5  There  exists  a  polynomial-time  randomized  algorithm  for  job  shop  scheduling, 
that,  with  high  probability,  yields  a  schedule  that  is  of  length 

Our  techniques  are  useful  not  only  for  the  job  shop  problem,  but  can  easily  be  extended  to 
the  general  problem  of  dag  shop  scheduling.  Another  important  generalization  is  the  situation 
where,  rather  than  having  m  different  machines,  there  are  m'  types  of  machines,  and  for  each 
type,  there  are  a  specified  number  of  identical  machines;  each  operation,  rather  than  being 
assigned  to  one  machine,  may  be  processed  on  any  machine  of  the  appropriate  type.  These 
problems  have  significant  practical  importance,  since  in  real-world  shops,  we  expect  that  a 
job  need  not  follow  a  total  order  and  that  the  shop  has  more  than  one  copy  of  many  of  its 
machines.  We  will  give  approximation  algorithms  with  the  same  performance  guarantees  for 
this  generalization  as  well. 

When  m  and  p  are  constants,  we  can  achieve  much  better  approximation  guarantees.  Specif¬ 
ically,  we  give  a  (2  -I-  e)-approximation  algorithm  for  this  special  case.  Finally,  we  give  parallel 
approximation  algorithms  for  all  the  scheduling  models  mentioned  above  and  some  improved 
results  for  the  open  shop  problem. 

While  all  the  algorithms  that  we  give  are  polynomial-time,  they  are  all  also  rather  inefficient. 
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Most  rely  on  the  algorithms  of  Sevast'yanov,  and  for  example,  his  algorithm  for  job  shop 
scheduling  takes  0((fimn)^)  time.  As  a  result,  we  do  not  refer  explicitly  to  running  times 
throughout  the  remainder  of  this  chapter.  In  Chapter  6,  we  will  carefully  consider  the  running 
time  for  a  deterministic  version  of  the  algorithm  presented  in  this  chapter. 

The  rest  of  this  chapter  is  organized  as  follows.  In  Section  5.2  we  extend  the  basic  technique 
of  Leighton,  Maggs  and  Rao  to  the  general  job  shop  problem.  In  Section  5.3  we  show  how  to 
scale  and  reduce  the  input  data  so  that  the  techniques  of  Section  5.2  yield  good  performance 
bounds.  In  Section  5.4  we  show  how  our  techniques  apply  to  more  general  problems.  We 
conclude  with  a  discussion  of  the  open  shop  problem  in  Section  5.5  and  some  open  problems  in 
Section  5.6. 


5.2  The  Basic  Algorithm 

In  this  section  we  extend  the  technique  due  to  Leighton,  Maggs  and  Rao  [41]  of  assigning 
random  delays  to  jobs  to  the  general  case  of  non-preemptive  job  shop  scheduling.  A  valid 
schedule  assigns  at  most  one  job  to  a  particular  machine  at  any  time,  and  schedules  each  job  on 
at  most  one  machine  at  any  time.  Their  approach,  for  the  special  case  of  unit-size  operations 
and  at  most  one  operation  of  each  job  on  each  machine,  was  to  first  create  a  schedule  that 
obeyed  only  the  second  constraint,  and  then  build  from  this  a  schedule  that  satisfies  both 
constraints  and  is  not  much  longer.  An  outline  of  their  strategy  follows: 

1.  Define  the  oblivious  schedule,  where  each  job  starts  running  at  time  0  and  runs  continu¬ 
ously  until  all  of  its  operations  have  been  completed.  This  schedule  is  of  length  Pmmxi  but 
there  may  be  times  when  more  than  one  job  is  assigned  to  a  particular  machine. 

2.  Perturb  this  schedule  by  delaying  the  start  of  the  first  operation  of  each  job  by  a  random 
integral  amount  chosen  uniformly  in  [0,  n^Bx/logn].  The  resulting  schedule,  with  high 
probability,  has  no  more  than  O(logn)  operations  assigned  to  any  machine  at  any  time. 

3.  Reschedule  each  unit  of  time  t  into  0(log  n)  units  of  time  during  which  each  of  the  O(log  n) 
operations  scheduled  for  time  t  is  processed.  The  resulting  (valid)  schedule  is  of  length 
0(Pm*xlogn-|-n,n«<)- 


136 


CHAPTER  5.  APPROXIMATION  ALGORITHMS  FOR  SHOP  SCHEDULING 


Our  strategy  builds  upon  this  framework  of  Leighton,  Maggs  and  Rao.  Whereas  Step  1  is 
the  same  and  Step  2  differs  in  only  a  few  technical  details,  the  essential  difficulty  in  obtaining 
the  generalization  is  in  Step  3. 

2.  Perturb  this  schedule  by  delaying  the  start  of  the  first  operation  of  each  job  by  a  ran¬ 

dom  integral  amount  chosen  uniformly  in  [0,  Ilmax]*  I'he  resulting  schedule,  with  high 
probability,  has  no  more  than  jobs  assigned  to  any  machine  at  any  time. 

3.  “Spread”  this  schedule  so  that  at  each  point  in  time  all  operations  currently  being  pro¬ 
cessed  have  the  same  size,  and  then  “flatten”  this  into  a  schedule  that  has  at  most  one 
job  per  machine  at  any  time. 

For  the  analysis  of  Step  2,  we  assume  that  is  bounded  above  by  a  polynomial  in  n  and 
p.  In  the  next  section  we  will  show  how  to  remove  this  assumption.  As  is  usually  the  case,  we 
assume  that  n  >  m;  analogous  bounds  can  be  obtained  when  n  <  m. 

Lemma  5.2.1  Given  a  job  shop  instance  in  which  is  bounded  above  by  a  polynomial  in  n  and 
p,  the  strategy  of  delaying  each  job  an  initial  integral  amount  chosen  randomly  and  uniformly  from 
[0,  Tima*]  and  then  processing  its  operations  in  sequence  yields  an  (invalid)  schedule  that  has  length 
at  most  -I-  Pm»x  and,  with  high  probability,  has  no  more  than  jobs  scheduled  on 

any  machine  during  any  unit  of  time. 

Proof:  Fix  a  time  /  and  a  machine  m,  ;  consider  p  =  Prob[at  least  r  units  of  processing  are 
scheduled  on  machine  i  at  time  t].  There  are  at  most  (”“')  ways  to  choose  r  units  of  processing 
from  all  those  required  on  m^.  If  we  focus  on  a  particular  one  of  these  r  units  and  a  specific 
time  t,  then  the  probability  that  it  is  scheduled  at  time  t  is  at  most  l/IIm**,  since  we  selected  a 
delay  uniformly  at  random  from  among  possibilities.  If  all  r  units  are  from  different  jobs, 
then  the  probability  that  they  are  all  scheduled  at  time  t  is  at  most  since  the  delays 

are  chosen  independently.  Otherwise,  the  probability  that  all  r  are  scheduled  then  is  0,  since 
it  is  impossible.  Therefore, 
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^  ~  P  <  To  bound  the  probability  that  any  machine  at  any 

time  has  more  than  jobs  using  it,  multiply  p  by  for  the  number  of  time 

units  in  the  schedule,  and  by  m  for  the  number  of  machines.  Since  we  have  assumed  that  Pmmx 
is  bounded  by  a  polynomial  in  n  and  p,  P,„m  +  !!„„,  is  as  well;  choosing  k  large  enough  yields 
that,  with  high  probability,  no  more  than  jobs  are  scheduled  for  any  machine  during 

any  Unit  of  time.  ■ 

In  the  special  case  of  unit-length  operations  treated  by  Leighton,  Maggs  and  Rao,  a  schedule 
S  of  length  L  that  has  at  most  c  jobs  scheduled  on  any  machine  at  any  unit  of  time  can  trivially 
be  “flattened”  into  a  valid  schedule  of  length  cL  by  replacing  one  unit  of  5's  time  with  c  units 
of  time  in  which  we  run  each  of  the  jobs  that  was  scheduled  for  that  time  unit.  (See  Figure 
5.1.) 

For  preemptive  job  shop  scheduling,  where  the  processing  of  an  operation  may  be  interrupted, 
each  unit  of  an  operation  can  be  treated  as  a  unit-length  operation  and  a  schedule  that  has 
multiple  operations  scheduled  simultaneously  on  a  machine  can  easily  be  flattened  into  a  valid 
schedule.  This  strategy  is  not  possible  for  non-preemptive  job  shop  scheduling,  and  in  fact  it 
seems  to  be  more  difficult  to  flatten  the  schedule  in  this  case.  We  give  an  algorithm  that  takes 
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a  schedule  of  length  L  with  at  most  c  operations  scheduled  on  one  machine  at  any  time  and 
produces  a  schedule  of  length  0(cLlogp„„). 

Lemma  5.2.2  Given  a  schedule  So  of  length  L  that  has  at  most  c  jobs  scheduled  on  one  machine 
during  any  unit  of  time,  there  exists  a  polynomial-time  algorithm  that  produces  a  valid  schedule  of 
length  0(cl  log 

Proof:  To  begin,  we  round  each  processing  time  ptj  up  to  the  next  power  of  2  and  denote  the 
rounded  times  by  that  is,  =  2^®******!.  Let  =  maXj^  p(^.  FVom  5o,  one  can  obtain 
a  schedule  S  that  uses  the  modified  p'-j  and  is  at  most  twice  as  long  as  So-  Furthermore,  an 
optimal  schedule  for  the  new  problem  is  no  more  than  twice  as  long  as  an  optimal  schedule  for 
the  original  problem. 

A  block  is  an  interval  of  a  schedule  with  the  property  that  each  operation  that  begins  during 
this  interval  has  length  no  more  than  that  of  the  entire  interval.  (Note  that  this  does  not  mean 
that  the  operation  finishes  within  the  interval.)  We  can  divide  S  into  [j^-]  consecutive  blocks 
of  size  p'„^.  We  will  give  a  recursive  algorithm  that  reschedules  -  “spreads”  -  each  block  of  size 
p  (where  p  is  a  power  of  2)  into  a  sequence  of  schedule  fragments  of  total  length  plogp.  The 
operations  scheduled  in  a  fragment  of  length  T  all  have  length  T  and  start  at  the  beginning 
of  the  fragment.  This  algorithm  takes  advantage  of  the  fact  that  if  an  operation  of  length  p  is 
scheduled  to  begin  in  a  block  of  size  p,  then  that  job  is  not  scheduled  on  any  other  machine 
until  after  this  block.  Therefore,  that  operation  can  be  scheduled  to  start  after  all  the  smaller 
operations  in  the  block  have  finished. 

To  reschedule  a  block  B  of  size  pl„„,  we  first  construct  the  final  fragment  (which  has 
length  pj„„)i  then  we  construct  the  preceding  fragments  by  recursive  calls  of  the  algorithm. 
For  each  operation  of  length  that  begins  in  B,  reschedule  that  operation  to  start  at  the 
beginning  of  the  final  fragment,  and  delete  it  from  B.  Now  each  operation  that  still  starts  in 
B  has  length  at  most  pJ„„/2,  so  B  can  be  subdivided  into  two  blocks,  Bi  and  Bj,  each  of  size 
pJn,x/2,  and  we  can  recurse  on  each.  See  Figure  5.2. 

The  recurrence  equation  that  describes  the  total  length  of  the  fragments  produced  from  a 
block  of  size  T  is  fiT)  =  2/(f )  -I-  T;  /(I)  =  1.  Thus  f[T)  =  e(riogT),  and  each  block  B  in  5 
of  size  is  spread  into  a  schedule  of  length  logPlnw-  ®y  spreading  the  schedule  «S,  we 


5.2.  THE  BASIC  ALGORITHM 


139 


Figure  5.2:  (a)  The  initial  greedy  schedule  of  length  8.  =  4.  (b)  The  first  level  of 

spreading.  All  jobs  of  length  4  have  been  put  in  the  final  fragments.  We  must  now  recurse  on 
Bi  and  B^  with  =  2.  (c)  The  final  schedule  of  length  Slogj  8  =  24. 
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produce  a  new  schedule  S'  that  satisfies  the  following  conditions: 

1.  At  any  time  in  S',  all  operations  scheduled  have  the  same  length.  Furthermore,  any  two 
operations  either  start  at  the  same  time  or  do  not  overlap. 

2.  If  5  has  at  most  c  jobs  scheduled  on  one  machine  at  any  time,  then  S'  has  at  most  c  jobs 
scheduled  on  one  machine  at  any  time  as  well. 

3.  S'  schedules  a  job  on  at  most  one  machine  at  any  time. 

4.  S'  does  not  schedule  the  tth  operation  of  job  Jj  until  the  first  t  —  1  are  completeo. 

Condition  1  is  satisfied  by  each  pair  of  operations  on  the  same  machine  by  the  definition  of 
spreading  and  it  is  satisfied  by  each  pair  of  operations  on  different  machines  because  the  division 
of  time  into  fragments  is  the  same  on  all  machines.  To  prove  condition  2,  note  that  operations 
of  length  T  that  are  scheduled  at  the  same  time  on  the  same  machine  in  the  expanded  schedule 
started  in  the  same  block  of  size  T  on  that  machine.  Since  they  all  must  have  been  scheduled 
during  the  last  unit  of  time  of  that  block,  there  can  be  at  most  c  of  them. 

To  prove  condition  3,  note  that  if  a  job  is  scheduled  by  S'  on  two  machines  simultaneously, 
then  it  must  have  been  scheduled  by  S  to  start  two  operations  of  length  T  in  the  same  block 
of  length  T  on  two  different  machines.  Consequently,  it  was  scheduled  by  S  on  two  machines 
during  the  last  unit  of  time  of  that  block,  which  violates  the  properties  of  <5. 

Finally,  we  verify  condition  4  by  first  noting  that  if  two  operations  of  a  job  are  in  different 
blocks  of  size  in  S,  then  they  are  certainly  rescheduled  in  the  correct  order.  Therefore 
it  suffices  to  focus  on  the  schedule  produced  from  one  block.  Within  a  block,  if  an  operation 
is  rescheduled  to  the  final  fragment,  then  it  is  the  last  operation  for  that  job  in  that  block. 
Therefore  S'  does  not  schedule  the  ith  operation  of  job  Jj  until  the  first  i  —  1  are  completed. 

The  schedule  S'  can  easily  be  flattened  to  a  schedule  that  obeys  the  constrmnt  of  one 
job  per  machine  at  any  time,  since  c  operations  of  length  T  that  start  at  the  same  time  can 
just  be  executed  one  after  the  other  in  total  time  cT.  Since  what  we  are  doing  is  effectively 
synchronizing  the  entire  schedule  block  by  block,  it  is  important  when  flattening  the  schedule  to 
make  each  machine  wait  enough  time  for  all  machines  to  process  all  operations  of  that  fragment 
length,  even  if  some  machines  have  no  operations  of  that  length  in  that  fragment. 
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The  schedule  S'  has  length  L  logpI„„;  therefore  the  flattened  schedule  has  length  iclogp{„„. 

■ 

We  note  in  passing  that  the  inclusion  of  release  dates  into  the  problem  does  not  affect  the 
quality  of  our  bounds  at  all.  The  release  dates  can  either  be  directly  included  into  probabilistic 
analysis  of  lemma  5.2.1,  or  we  can  view  each  release  date  as  one  additional  initial  operation  on 
some  (imaginary)  maclune. 


5.3  Reducing  the  Problem 

In  the  previous  section  we  showed  how  to  produce,  with  high  probability,  a  schedule  of  length 

O  logPm^)  , 

under  the  assumption  that  p^ax  was  bounded  above  by  a  polynomial  in  n  and  p.  Since 

Umax  "i"  E nnax  =  0(max  {Umax,  P„„}) 

this  schedule  is  within  a  factor  of  logp^ax)  of  optimal.  In  this  section,  we  first 

remove  the  assumption  that  pn,ax  is  bounded  above  by  a  polynomial  in  n  and  p  by  showing  that 
we  can  reduce  the  general  problem  to  that  special  case  while  only  sacrificing  a  constant  factor 
in  the  approximation,  thereby  yielding  an  0{  )-approximation  algorithm.  Then  we  how 

how  to  sacrifice  another  constant  factor  to  reduce  to  the  special  case  that  n  is  polynomially 
bounded  in  m  and  p.  Combining  these  two  results,  we  conclude  that  we  can  reduce  the  general 
job  shop  problem  to  the  case  where  n  and  p„,ax  are  polynomially  bounded  in  m  and  p,  while 
changing  the  performance  gu^  'antee  by  only  a  constant. 

5.3.1  Reducing  p^ax 

First  we  show  that  we  can  reduce  the  problem  to  one  where  p^ax  is  bounded  by  a  polynomial 
in  n  and  p.  Let  u  =  \0\  be  the  total  number  of  required  operations.  Note  that  a;  <  np.  Round 
down  each  Pij  to  the  nearest  multiple  of  Pmax/‘*'*  denoted  by  p^.  Now  there  are  at  most  w 
distinct  values  of  p'^  and  they  are  all  multiples  of  Pmax/w.  Therefore  we  can  treat  the  pj^  as 


142 


CHAPTER  5.  APPROXIMATION  ALGORITHMS  FOR  SHOP  SCHEDULING 


integers  in  {0, . .  a  schedule  for  this  problem  can  be  trivially  rescaled  to  a  schedule  5'  for 
the  actual  (Note  that  assigning  =  0  does  not  mean  that  this  operation  does  not  exist; 
instead,  it  should  vic-wed  as  an  operation  that  takes  an  arbitrarily  small  amount  of  time.)  Let 
L  denote  the  length  of  S'.  We  claim  that  S'  for  this  reduced  problem  can  be  interpreted  as  a 
schedule  for  the  original  operations  that  has  length  at  most  L  +  Pmu*  When  we  adjust  the  p^^ 
up  to  the  original  Pij,  we  add  an  amount  that  is  at  most  Pm»xl(*>  to  Since  the  length  of 

a  schedule  is  determined  by  a  critical  path  through  the  operations  and  there  are  u  operations, 
we  add  a  total  amount  of  at  most  p^u  to  the  length  of  the  schedule;  thus,  the  new  schedule 
has  length  at  most  L  +  p^,,  <  L  +  C^^.  Therefore  we  have  rounded  a  general  instance  I  of 
the  job  shop  problem  to  an  instance  T  that  can  be  treated  as  having  pm^x  =  0{np.)\  further,  a 
schedule  for  T  yields  a  schedule  for  1  that  is  no  more  than  longer.  Thus,  we  have  proved 
the  following  lemma: 

Lemma  5.3.1  There  exists  a  polynomial-time  algorithm  that  transforms  any  instance  of  the  job 
shop  scheduling  problem  into  one  with  p^ax  =  0{np)  with  the  property  that  a  schedule  for  the 
modified  instance  of  length  converted  in  polynomial  time  to  a  schedule  for  the  original 

instance  of  length  {k  + 

5.3.2  Reducing  the  Number  of  Jobs 

To  reduce  an  arbitrary  instance  of  job  shop  scheduling  to  one  with  a  number  of  jobs  polynomial 
in  m  and  p  we  divide  the  jobs  into  big  and  small  jobs.  We  say  that  job  Jj  is  big  if  it  has  an 
operation  of  length  more  than  nn,«x/(2mp^);  otherwise,  we  call  the  job  small.  For  the  instance 
consisting  of  just  the  short  jobs,  let  and  pj„^  denote  the  maximum  machine  load  and 
operation  length,  respectively.  Using  the  algorithm  of  [58]  described  in  the  introduction,  we 
can,  in  time  polynomial  in  the  input  size,  produce  a  schedule  of  length  -I-  2m/i*p{„„  for 
this  instance.  Since  is  at  most  nmax/(2m/x^)  and  <  Ilmax^  we  get  a  schedule  that  has 
length  no  more  than  2nmax-  Thus,  an  algorithm  that  produces  a  schedule  for  the  long  jobs  that 
is  within  a  factor  of  k  of  optimal  yields  a  (A:  -|-  2)-approximation  algorithm.  Note  that  there 
can  be  at  most  2m} long  jobs,  since  otherwise  there  would  be  more  than  mllmax  units  of 
processing  to  be  divided  amongst  m  machines,  which  contradicts  the  definition  of  Ilmu-  Thus 
we  have  shown: 


5.3.  REDUCING  THE  PROBLEM 


143 


Lemma  5.3.2  There  exists  a  polynomial-time  algorithm  that  transforms  any  instance  of  the 
job  shop  scheduling  problem  into  one  with  jobs  with  the  property  that  a  schedule  for  the 

modified  instance  of  length  can  be  converted  in  polynomial  time  to  a  schedule  for  the  original 

instance  of  length  {k  +  2)C^„. 

From  the  results  of  the  previous  two  sections  we  can  conclude  that: 


Theorem  5.3.3  There  exists  a  polynomial-tim<‘  randomized  algorithm  for  job  shop  scheduling, 
that,  with  high  probability,  yields  a  schedule  that  is  of  len^^th 


Proof:  In  Section  2  we  showed  how  to  produce  a  schedule  of  length 


O 


■^^"“-^loglogCn/i) 


under  the  assumption  that  p^ax  was  bounded  above  by  a  polynomial  in  r:  and  fi.  From  Lemmas 
5.3.1  and  5.3.2  we  know  that  we  can  reduce  the  problem  to  one  where  n  and  pn,„  are  poly¬ 
nomial  in  m  and  /x,  while  adding  only  a  constant  to  the  factor  of  approximation.  Since  now 
logPmax  =  O(log(mp))  and  logn  =  O(log(m/i)),  our  algorithm  produces  a  schedule  of  length 


Q(j2^inUiLc‘  1 

loglog(m/i)^fnax.^* 


Note  that  when  p  is  bounded  by  a  polynomial  in  m,  the  bound  only  depends  on  m.  In 
particular,  this  implies  the  following  corollary: 


Corollary  5.3.4  There  exists  a  polynomial-time  randomized  algorithm  for  flow  shop  scheduling, 
that,  with  high  probability,  yields  a  schedule  that  is  of  length  O(iogfo^^max)-  ® 

We  now  briefly  address  the  issue  of  a  parallel  version  of  our  shop  scheduling  algorithm. 
A^C  is  the  class  of  problems  that  can  be  solved  using  a  polynomial  number  of  processors  and 
polylogarithmic  time.  TlAfC  is  the  class  that,  in  addition,  allows  each  processor  to  general  a 
log  n  bit  random  number  at  each  step.  Except  for  the  use  of  Sevast’yanov’s  algorithm,  all  these 
techniques  can  be  carried  out  in  TlAfC  We  assign  one  processor  to  each  operation.  The  rounding 
in  the  proof  of  Lemma  5.2.2  can  be  done  in  AfC.  We  set  the  random  delays  and  inform  each 
processor  about  the  delay  of  its  Job.  By  summing  the  values  of  pij  for  all  of  its  job’s  operations, 
each  processor  can  calculate  where  its  operation  is  scheduled  with  the  delays  and  then  where  it 


144 


CHAPTER  5.  APPROXIMATION  ALGORITHMS  FOR  SHOP  SCHEDULING 


is  scheduled  in  the  recursively  spread-out  schedule.  These  sums  can  be  calculated  via  parallel 
prefix  operations.  With  simple  AfC  techniques  we  can  assign  to  each  operation  a  rank  among  all 
those  operations  that  are  scheduled  to  start  at  the  same  time  on  its  machine,  and  thus  flatten 
the  spread  out  schedule  to  a  valid  schedule. 

Corollary  5.3.5  There  exists  a  HMC  algorithm  for  job  shop  scheduling,  that,  with  high  proba¬ 
bility,  yields  a  schedule  that  is  of  length 

5.3.3  A  Fixed  Number  of  Machines 

Sevast’yanov’s  algorithm  for  the  job  shop  problem  can  be  viewed  as  a  (1  -I-  )-approximation 
algorithm,  which  when  m  and  /i  are  constant,  is  an  O(l)-approximation  algorithm;  that  is, 
it  delivers  a  solution  within  a  constant  factor  of  the  optimum.  The  technique  of  partitioning 
the  set  of  jobs  by  size  can  be  applied  to  give  a  much  better  performance  guarantee  in  this 
case.  Now,  call  a  job  Jj  big  if  there  is  an  operation  0,^  with  p,j  >  where  c  is  am 

arbitrary  positive  constant.  There  are  at  most  mV^/f  big  jobs,  and  since  m,  ft  and  c  aje  fixed, 
the  number  of  jobs  is  constant. 

Now  use  Sevaist’yanov’s  algorithm  to  schedule  all  the  small  jobs.  The  resulting  schedule  is 
of  length  at  most  (1-1-  There  are  only  a  constant  (albeit  a  huge  constant)  number  of 

ways  to  schedule  the  big  jobs.  Therefore  the  best  one  can  be  selected  in  polynomial  time  and 
executed  after  the  schedule  of  the  short  jobs.  The  additional  length  of  this  part  is  no  more  than 

'-"max- 

Thus  we  have  shown: 

Theorem  5.3.6  For  the  job  shop  scheduling  problem  where  both  m  and  n  are  fixed,  there  is  a 
polynomial-time  algorithm  that  produces  a  schedule  of  length  <  (2  + 

5.4  Applications  to  More  General  Scheduling  Problems 

The  fact  that  the  quality  of  our  approximations  is  based  solely  on  the  lower  bounds  Ilmax  and 
Pmt.%  makes  it  quite  easy  to  extend  our  techniques  to  the  more  general  problem  of  dag  shop 
scheduling.  We  define  Ilmax  and  exactly  the  same  way,  and  max  Fm»x}  remains  a 
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lower  bound  for  the  length  of  any  schedule.  We  can  convert  this  dag  shop  scheduling  problem 
to  a  job  shop  problem  by  selecting  for  each  job  an  arbitrary  total  order  that  is  consistent  with 
its  partial  order.  Ilmuc  and  have  the  same  values  for  both  problems.  Therefore,  a  schedule 
of  length  p  •  (Ilmuc  +  Pmmx)  ^or  this  job  shop  instance  is  a  schedule  for  the  original  dag  shop 
scheduling  instance  of  length  0(pC^^). 

A  further  generalization  to  which  our  techniques  apply  is  where,  rather  than  m  different 
machines,  we  have  m'  types  of  machines,  and  for  each  type  we  have  a  specified  number  of 
identical  machines  of  that  type.  Instead  of  requiring  an  operation  to  run  on  a  particular  machine, 
an  operation  now  may  run  on  any  one  of  these  identical  copies.  The  value  Pmmx  remains  a  lower 
bound  on  the  length  of  any  schedule  for  this  problem.  The  value  nm»x»  which  was  a  lower 
bound  for  the  job  shop  problem  must  be  replaced,  since  we  do  not  have  a  specific  assignment 
of  operations  to  machines,  and  the  sum  of  the  processing  times  of  all  operations  assigned  to  a 
type  is  not  a  lower  bound.  Let  5,,  1  =  1,..  .m',  denote  the  sets  of  identical  machines,  and  let 
n(5j)  be  the  sum  of  the  lengths  of  the  operations  that  run  on  5<.  Our  strategy  is  to  convert 
this  problem  to  a  job  shop  problem  by  assigning  operations  to  specific  machines  in  such  a  way 
that  the  maximum  machine  load  is  within  a  constant  factor  of  the  fundamiental  lower  bounds 
for  this  problem.  To  obtain  a  lower  bound  on  the  maximum  machine  load,  the  best  we  can  do 
is  to  distribute  the  operations  evenly  across  machines  in  a  set,  and  thus 


n 


avg 


=  max 

s. 


n(5.) 

|5.l 


is  certainly  a  lower  bound  on  the  maximum  machine  load.  Furthermore,  we  can  not  split 
operations,  so  Pmax  is  also  a  lower  bound.  We  will  now  describe  how  to  assign  operations  to 
machines  so  that  the  maximum  machine  load  of  the  resulting  job  shop  scheduling  problem  is 
at  most  2n»vg  +  Pmax-  A  schedule  for  the  resulting  job  shop  problem  of  length  p  •  (Ilmax  +  Pmmx) 
yields  a  solution  for  the  more  general  problem  of  length  0(p-(n,vg  +  Pmmx))-  Sevast’yanov  [58] 
used  a  somewhat  more  complicated  reduction  to  handle  a  slightly  more  general  setting. 

For  each  operation  0,^  to  be  processed  by  a  machine  in  5*,  if  Pij  >  n(5t)/|St|,  assign  Oij  to 
one  machine  in  S*.  There  are  certainly  enough  machines  in  5*  to  make  this  assignment  and  all 
such  operations  contribute  at  most  Pmax  to  the  maximum  machine  load.  Those  operations  not 
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yet  assigned  each  have  length  at  most  II(S*)/|5t|  and  have  total  length  <  11(5*).  Therefore, 
these  operations  can  be  assigned  easily  to  the  remaining  machines  so  that  less  than  2n(5*)/5* 
processing  units  are  assigned  to  each  machine.  Combining  these  two  bounds,  we  get  an  upper 
bound  of  2n»vg  +  Pm%x  on  the  maximum  machine  load  which  is  within  a  constant  factor  of  the 
lower  bound  of  max{n,»g,pm*x}- 

Theorem  5.4.1  There  exists  a  polynomial-time  randomized  algorithm  for  dag  shop  scheduling 
with  identical  copies  of  machines  that,  with  high  probability,  yields  a  schedule  th'  ‘  is  of  length  at 

Corollary  5.4.2  There  exists  an  VMC  algorithm  for  dag  shop  scheduling  with  identical  copies  of 
machines  that,  with  high  probability,  yields  a  schedule  that  is  of  length  at  most 


5.5  The  Open  Shop  Problem 

Recall  that  in  the  open  shop  problem  the  operations  of  a  job  can  be  executed  in  any  order. 
Fiala  [16]  has  shown  that  if  Hma*  >  (16m  log  m  -f  21m)pmax»  then  is  just  smd  there 
is  a  polynomial-time  algorithm  to  find  an  optimal  schedule,  but  in  general  this  problem  is 
strongly  A/’T’- Complete.  We  will  show  that,  in  contrast  to  the  job  and  flow  shop  problems,  a 
simple  greedy  strategy  yields  a  fairly  good  approximation  to  the  optimal  open  shop  schedule. 
Consider  the  algorithm  that,  whenever  a  msMrhine  is  idle,  assigns  to  it  any  job  that  has  not 
yet  been  processed  on  that  machine  and  is  not  currently  being  processed  on  another  machine. 
Anna  Racsmany  [6]  has  observed  that  this  greedy  algorithm  delivers  a  schedule  of  length  at 
most  Ilnux  -I-  (m  -  l)pmax-  We  can  adapt  her  proof  to  show  that,  in  fact,  the  greedy  algorithm 
delivers  a  schedule  that  is  no  longer  than  a  factor  of  2  times  optimal.  In  fact  Wein  [68],  has 
shown  that  even  with  release  dates  the  greedy  algorithm  is  a  2-approximation  algorithm.  He 
has  also  shown  that  this  bound  is  fmrly  tight,  since  he  can  produce  schedules  of  length  (2— 
times  optimal.  We  include  his  proof  here. 

Theorem  5.5.1  The  greedy  algorithm  for  the  open  shop  problem  is  a  2- approximation  algorithm, 
even  when  each  job  Jj  has  an  associated  release  date  Vj  on  which  it  becomes  available  for  processing. 
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Proof:  Consider  the  machine  mt  that  finishes  last  in  the  greedy  schedule.  This  machine  is 
active  sometimes,  idle  sometimes,  and  finishes  by  completing  some  job  Jj.  Since  the  schedule 
is  greedy,  whenever  is  idle,  J,  is  either  being  processed  by  some  other  machine  or  has  not 
yet  been  released.  Therefore,  the  idle  time  is  at  most  Pij  +  +  *">•  Thus,  machine 

mt  is  processing  for  at  most  Ilmu,  units  of  time  and  is  idle  for  less  than  Pj  +  vj  units  of  time. 
This  implies  Cmmx  <  Hmaji  +  Pj  +  rj.  However,  the  value  Pj  +  is  a  lower  bound  on  the  length 
of  the  schedule,  since  no  processing  of  job  Jj  coiild  start  until  time  tj.  ■ 

Using  a  slightly  different  (non-greedy)  strategy,  we  can  derive  another  algorithm  that 
achieves  a  schedule  of  length  0(C^„logn).  This  algorithm  is  also  easily  parallelizable,  thus 
putting  the  problem  of  finding  an  O(logn)-approximation  to  the  open  shop  scheduling  problem 
in  AfC. 

We  define  the  jobs  graph,  which  is  a  bipartite  graph  that  represents  an  instance  of  the  open 
shop  problem.  One  side  of  the  bipartition  contains  m  nodes,  one  for  each  machine,  whereas  the 
other  side  contains  n  nodes,  one  for  each  job.  If  job  Jj  has  an  operation  on  machine  » then  the 
jobs  graph  contans  an  edge  between  the  respective  nodes. 

First  consider  the  case  when  all  operations  have  the  same  size,  1.  Let  A  be  the  maximum 
degree  of  any  node  in  the  remaining  jobs  graph.  Then  f  A  is  a  lower  bound  on  the  length  of 
the  optimal  schedule  for  this  problem.  However,  since  this  graph  is  bipartite  with  maximum 
degree  A,  it  can  be  edge-colored  using  exactly  A  colors.  So  we  edge-color  the  graph,  and  then 
schedule  the  operations  in  each  color  class  separately,  thereby  producing  a  schedule  of  length 
^A,  which  is  optimal.  As  long  as  there  is  at  least  one  processor  per  operation,  this  coloring  can 
be  done  in  NC  using  the  edge-coloring  algorithm  of  Lev,  Pippinger,  and  Valiant  [45]. 

We  can  extend  this  algorithm  to  one  that  solves  the  general  open  shop  problem  by  first  using 
the  techniques  of  Section  5.3.1  to  reduce  the  problem  to  the  case  where  all  operations  have  sizes 
polynomial  in  n,  and  then  by  rounding  the  operation  sizes  so  they  are  all  powers  of  2.  Now 
there  are  only  O(logn)  different  operation  sizes.  We  schedule  each  one  separately,  using  the 
edge-coloring  based  strategy  described  in  the  previous  paragraph.  The  schedule  we  get  for  any 
particular  I  is  optimal  for  that  operations  of  that  size,  and  hence  each  of  the  0(log  n)  schedules 
we  produce  has  length  0{C^^).  Concatenating  these  schedules  together,  and  observing  that 
the  rounding  can  easily  be  done  in  J\fC,  we  obtain  the  following  theorem: 
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Theorem  5.S.2  An  open  shop  schedule  of  length  ^(C^^logn)  can  be  found  in  ^^C. 

We  can  also  use  the  results  on  open  shop  to  get  a  simple  bound  for  general  dag  shop 
scheduling.  For  certain  classes  of  problems  this  approach  ^ves  better  bounds  than  those  given 
in  5.3.3.  Consider  the  case  when  the  constraints  for  each  job  form  a  dag.  We  will  refer  to  each 
job  as  a  node  of  the  da^.  We  define  the  depth  of  a  node  to  be  the  distance  of  that  node  from 
the  root  and  the  depth  of  a  dag  to  be  the  length  of  the  longest  root-leaf  path  in  the  dag.  If  each 
job  j  has  an  associated  dag  Dj ,  let  dj  be  the  depth  of  Dj . 

Theorem  5.5.3  Given  a  dag  scheduling  problem,  let  d  =  maXj  dj,  the  maximum  depth  of  any 
of  the  dags.  Then  there  exists  an  algorithm  that  finds  a  schedule  of  length  0{dCmmx)- 

Proof ;  For  each  dag  Dj ,  let  Dj  be  the  set  ‘  all  operations  at  level  i  in  the  dag.  Our  algorithm 
consists  of  d  iterations.  In  iteration  t,  we  consider  the  scheduling  problem  consisting  of,  for  each 
job  j,  all  jobs  in  Dj.  The  key  observation  is  that  in  a  dag,  all  jobs  in  any  level  are  independent, 
i.e.,  there  are  no  precedence  constraints  among  them.  Hence  the  scheduling  problem  for  each  » 
is  an  open  shop  problem.  Further,  since  each  of  these  problems  is  a  subproblem  of  the  original 
dag  scheduling  problem,  by  Theorem  5.5.1,  there  certainly  exists  a  schedule  for  the  jobs  at 
level  i  of  length  where  is  the  length  of  the  optimal  schedule  for  the  original  dag 

scheduling  problem.  Concatenating  the  d  schedules  yields  a  schedule  satisfying  the  conditions 
of  the  lemma.  ■ 

If  d  is  constant,  this  approach  yields  constant  factor  approximation.  Moreover,  if  each  level 
has  about  1  /d  of  the  processing  of  each  job  and  1  fd  of  the  processing  of  each  machine,  we  also 
get  a  constant  factor  approximation,  regardless  of  how  many  levels  we  have. 

5.6  Conclusions  and  Open  Problems 

We  have  given  the  first  polynomial-time  polylog-approximation  algorithms  for  minimizing  the 
maximum  completion  time  for  the  problems  of  job  shop  scheduling,  flow  shop  scheduling,  dag 
shop  scheduling  and  a  generalization  of  dag  shop  scheduling  in  which  there  are  groups  of  iden¬ 
tical  machines.  The  most  basic  question  to  be  pursued  is  the  development  of  approximation 
algorithms  with  even  better  performance  guarantees.  It  is  our  belief  that  the  O(logpn,ui)  factor 
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that  is  introduced  by  the  techniques  of  Section  5.2  can  be  improved  upon,  perhaps  even  by  a 
simple  greedy  method.  Such  methods  have  proved  frustratingly  difficult  to  analyze,  however. 
The  other  logarithmic  factor  in  the  performance  bound  seems  much  more  difficult  to  improve 
upon. 

An  interesting  consequence  of  our  results  is  the  following  observation  about  the  structure 
of  shop  scheduling  problems.  Assume  we  have  a  set  of  jobs  that  need  to  run  on  a  set  of 
machines.  We  know  that  any  schedule  for  the  associated  open  shop  problem  must  have  length 
+  Pmux)-  Furthermore,  we  know  that  no  matter  what  type  of  partial  ordering  we  impose 
on  the  operations  of  each  job  we  can  produce  a  schedule  of  length  0((nm„  + 

Hence  for  any  instance  of  the  open  shop  problem,  we  can  impose  an  arbitrary  partial  order  on 
the  operations  of  each  job  and  increase  the  length  of  the  optimal  schedule  by  a  factor  of  no 
more  than 

An  interesting  combinatorial  question  is,  *^Can  the  imposition  of  a  partial  order  really  make 
the  optimal  schedule  that  much  longer  than  0(nm*x  +  .FmM)?”  In  other  words,  how  good  are 
Umax  and  Pmmx  as  lower  bounds?  We  have  seen  that  in  two  interesting  special  cases— job  shop 
scheduling  with  unit-length  operations  and  open  shop  scheduling,  there  is  a  schedule  of  length 
0(nm*x  +  /mw)-  Does  there  always  exist  an  0(nm„  +  Pma*)  schedule  for  the  general  job,  flow 
or  dag  shop  scheduling  problems? 

Beyond  this,  there  are  several  interesting  questions  raised  by  this  work,  including: 

•  Do  there  exist  parallel  algorithms  that  achieve  the  approximations  of  our  sequential  al¬ 
gorithms?  For  the  general  job  shop  problem  achieving  these  approximations  seems  hard, 
since  we  rely  heavily  on  the  algorithm  of  Sevast’yanov.  For  open  shop  scheduling,  how¬ 
ever,  a  simple  sequential  algorithm  achieves  a  factor  of  2,  whereas  the  best  AfC  algorithm 
that  we  have  achieves  only  an  O(logn)-approximation.  As  a  consequence  of  the  results 
above,  all  one  would  need  to  do  is  to  produce  any  greedy  schedule. 

•  Are  there  simple  variants  of  the  greedy  algorithm  for  open  shop  scheduling  that  achieve 
better  performance  guarantees?  For  instance,  how  good  is  the  algorithm  that  always 
selects  the  job  with  the  maximum  total  (remaining)  processing  time? 

•  Our  algorithms,  while  polynomial-time  algorithms,  are  inefficient.  Are  there  significantly 
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more  efficient  algorithms  that  have  the  same  performance  guarantees? 


Chapter  6 


Derandomizing  Shop  Scheduling  Via 
Flow  Techniques 

In  this  chapter,  we  show  how  the  shop  scheduling  algorithms  of  the  previous  chapter  can  be 
made  deterministic.  One  approach,  which  appears  in  [60],  is  to  use  a  derandomized  version  of 
the  randomized  rounding  techniques  of  Raghavan  and  Thompson  [52],  which  are  alluded  to  in 
Chapter  3.  While  this  approach  yields  a  polynomial- time  algorithm,  the  polynomial  is  rather 
large,  since  the  bottleneck  step  is  the  solution  of  a  large  linear  program.  Recently,  Plotkin, 
Shmoys  and  Tardos  [48]  have  generalized  the  multicommodity  flow  approximation  algorithms 
of  Chapter  2  to  show  how  to  approximate  a  large  class  of  packing  linear  programs.  In  this 
chapter  we  shall  use  their  results  to  obtain  an  algorithm  that  is  much  more  efficient  than  the 
randomized  rounding  approach.  The  key  will  be  to  phrase  the  problem  of  choosing  initial  delays 
as  the  solution  of  a  packing  integer  program.  We  then  show  how  to  And  an  integral  solution 
to  the  linear  relaxation  of  this  program  that  is  close  to  the  optimal  solution.  Besides  yielding 
a  faster  algorithm,  this  approach  also  yields  a  direct  method  for  finding,  in  polynomial  time, 
an  approximate  solution  to  a  certain  integer  program.  Previously,  such  solutions  could  only 
be  found  via  the  indirect  method  of  randomized  rounding.  Our  algorithm  will  imply  the  main 
result  of  this  chapter:  a  deterministic  version  of  the  shop  scheduling  algorithm  with  almost  the 
same  performance  guarantee  as  the  randomized  algorithm  of  Chapter  5. 
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6.1  A  Deterministic  Approximation  Algorithm 

In  this  section,  we  “derandomize’’  the  results  of  the  Chapter  5  ,  i.e.,  we  give  a  deterministic 
polynomial-time  algorithm  that  finds  a  schedule  of  length  0(log’(m/i)C^„).  Of  all  the  compo¬ 
nents  of  the  algorithm  of  Theorem  5.3.3,  the  only  step  that  is  not  already  deterministic  is  the 
step  that  chooses  a  random  initial  delay  for  each  job  so  that,  with  high  probability,  no  machine 
is  assigned  too  many  jobs  at  any  one  time.  The  reduction  to  the  special  case  in  which  n  and 
Pmmx  bounded  by  a  polynomial  in  m  and  ft  is  entirely  deterministic,  and  o  we  can  focus 
on  that  case  alone.  We  give  an  algorithm  that  deterministically  assigns  delays  to  eai..;  job  so 
as  to  produce  a  schedule  in  which  each  machine  has  O(log(m/i))  jobs  running  at  any  one  time. 
We  then  apply  Lemma  5.2.2  to  produce  a  schedule  of  length  0(log*(m^)C|I„u,).  The  bound  of 
0(log(m//))  jobs  per  machine  is  not  as  good  as  the  probabilistic  bound  of  We  do 

not  know  how  to  achieve  a  bound  of  deterministically* .  By  a  proof  nearly  identical 

to  that  of  Lemma  5.2.1,  however,  we  can  show  that  in  order  to  achieve  this  weaker  bound  of 
O(log(m/Lt))  jobs  per  machine,  we  now  only  need  to  choose  delays  in  the  range  [0,  Ilmui/  log( m/i)]. 
In  fact,  the  reduced  range  of  delays  yields  a  schedule  of  length  0(Pm»x  log*(m|i)-hnm*x  log(m|i)) 
which  is  within  an  O(log(m/i))  factor  of  optimal  if  =  0(n  max  /log(m/i)). 

We  now  state  the  problem  formally: 

Problem  6.1.1  Deterministically  assign  a  delay  to  each  job  in  the  range  [0,  lImu(/log(^^)]  so 
as  to  produce  a  schedule  with  no  more  than  O(log(m//))  jobs  on  any  machine  at  any  time. 

The  rest  of  this  chapter  focuses  on  solving  Problem  6.1.1. 

6.2  The  Framework 

In  this  section,  we  describe  the  framework  of  Plotkin,  Shmoys  and  Tardos[48]  for  approximately 
solving  packing  linear  programs.  We  will  be  somewhat  vague  in  our  description,  since  it  is  only 
meant  to  convey  the  main  ideas  of  their  work.  In  the  next  section,  we  will  be  more  formal  and 
prove  the  results,  citing  from  [48]  as  needed. 


*A  recent  result  by  Srinivasan  [65]  describes  how  to  achieve  this  bound  using  different  techniques. 
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Recall  the  concurrent  flow  problem  that  we  solved  in  Chapter  2.  We  now  state  it  as  a  linear 
program: 

minimize  A 
subject  to 

^  Mwv)-  ^  fi(vw)  =  0,  for  each  node  V 


i  =  1, . .  .,n; 

(6.1) 

Mvw) 

— 

d., 

for  V  =  3i,i  = 

(6.2) 

di. 

for  V  =  ti,i  =  1,. .  .,n; 

(6.3) 

tuviE 

5^ /.(«*«) 

1^1 

< 

A  •  u{vw), 

Vrtu  6  E; 

(6.4) 

1 

fi(vw) 

> 

0, 

Vvw  €  E,i=  l,...n. 

(6.5) 

Consider  algorithm  Concurrent.  It  initially  find  a  flow  that  is  a  feasible  solution  to  this 
linear  program,  for  some  value  of  A.  Each  iteration  finds  a  new  flow  (/*,  the  minimum-cost 
flow),  and  then  takes  a  convex  combination  of  the  new  and  old  flow,  thereby  producing  a  flow 
that  is  still  a  feasible  solution,  but  for  a  smaller  value  of  A.  Informally,  each  iteration  can  be 
thought  of  as  tightening  constraint  (6.4). 

Plotkin,  Shmoys  and  Tardos  [48]  have  shown  that  the  techniques  used  for  multicommodity 
flow  problem  in  Chapter  2  apply  to  a  much  wider  class  of  problems.  They  have  given  approxi¬ 
mation  algorithms  for  a  large  class  of  packing  linear  programs.  A  packing  linear  program  is  one 
that  can  be  expressed  in  the  following  form: 


minimize  A 

subject  to 

X 

€ 

P\ 

(6.6) 

Ax 

< 

A6; 

(6.7) 

X 

> 

0; 

(6.8) 
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where  Aisajipxq  non-negative  matrix,  6  is  an  p-dimensional  non-negative  vector,  and  P  is  a 
convex  set  in  R^. .  We  use  to  denote  the  t**'  row  of  A  and  use  bi  to  denote  the  t***  entry  in  6. 
Such  linear  programs  are  called  packing  programs  because  constraints  (6.7)  define  the  problem 
of  packing  a  convex  combination  of  vectors  subject  to  “capacity”  constraints  b. 

Let  A*  denote  the  minimum  possible  value  of  A.  The  main  result  of  [48]  is  that  if  a  packing 
linear  program  satisfies  certmn  technical  conditions,  then  a  solution  with  A  <  (1  -(-  €)A*  can  be 
found  in  polynomial  time.  Not  all  linear  programs  of  the  form  above  satisfy  these  technical 
conditions,  but  several  important  applications,  including  minimum-cost  multicommodity  flow, 
unrelated  parallel  machine  scheduling  and  the  Held-Karp  lower  bound  for  the  traveling  salesman 
problem  do  satisfy  these  conditions. 

We  now  show  that  the  multicommodity  flow  problem  can  be  expressed  as  a  packing  linear 
program.  Constraints  (6.1)  through  (6.3)  correspond  to  equation  (6.6),  where  P  is  just  the  set 
of  convex  combinations  of  all  flows  satisfying  flow  conservation.  Constraint  (6.7)  corresponds 
to  inequality  (6.4),  the  capacity  constraints.  The  correspondence  between  equations  (6.5)  and 
(6.8)  is  straightforward. 

Our  multicommodity  flow  algorithm  requires  a  minimum-cost  flow  subroutine  for  each  iter¬ 
ation.  The  packing  algorithm  requires  a  subroutine  OPT  that 

Given  an  m-dimensional  vector  y  >  0,  finds  i  €  P  such  that  cx  =  inin(ci  :  x  €  P) 
where  c  =  A. 

Again,  if  we  let  y  be  the  edge  lengths  i  in  the  multicommodity  flow  algorithm,  the  subroutine 
OPT  is  just  a  minimum-cost  flow  algorithm. 

We  do  not  wish  to  spend  much  time  on  the  general  case.  The  reader  is  referred  to  [48] 
for  a  host  of  algorithms  and  applications.  To  understand  how  the  algorithm  works,  we  state 
the  main  routine,  IMPROVE-packing  which  appears  in  Figure  6.1.  Note  the  similarities  with 
Decongest.  In  particular,  A,  the  maximum  edge  congestion  is  now  max^  Oix/b,,  the  maximum 
amount  by  which  the  constraints  (6.7)  are  violated.  The  constants  a  and  a  are  chosen  similarly 
to  the  way  they  are  in  Decongest,  but  a  depends  on  a  new  parameter  p  —  max^  maXs^P  Otx/b,, 
the  width  of  P  relative  to  Ax  <  b.  The  algorithm  computes  a  cost  yi  for  each  constraint.  It  then 
calls  a  routine  that  finds  a  minimum-cost  point  subject  to  these  costs.  Finally,  the  new  solution 
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Improve-Packing 

Ao  maXiOiX/bi;  a  *-  4Ao“*€“*  ln(2me“*);  a  *— 

while  max,- OiX/^i  >  Ao/2  and  x  and  y  do  not  satisfy  suitable  relaxed  optimality  conditions 

(1)  For  each  *  =  1, . . . , m:  set  y,-  ig®***/**. 

(2)  Find  a  minimum-cost  point  x  €  P  for  costs  c  =  y‘^A. 

(3)  Update  X  •<— (1  -  <r)x -I- ox. 
return  x 


Figure  6.1:  Procedure  Improve-Packing 

is  set  equal  to  a  convex  combination  of  the  old  and  minimum-cost  solutions.  Performing  one 
iteration  of  this  algorithm  leads  to  a  significant  decrease  in  a  potential  function  $  =  y^b.  Given 
these  similarities,  one  can  see  how  much  of  the  basic  analysis  follows  along  the  same  lines  as 
that  of  the  multicommodity  flow  algorithm. 

We  also  discuss  two  extensions  to  the  framework  described  above  that  will  be  needed  in  the 
next  section.  The  first  occurs  when  the  polytope  P  can  be  expressed  as  a  product  of  polytopes 
of  smaller  dimension,  i.e.,  P  =  P*  x  •  •  •  x  P*.  A  vector  x  is  now  partitioned,  in  some  way,  into  a 
series  of  vectors  x^ . .  .,x*  and  x  €  P  if  and  only  if  x*  €  P*,  »  =  1 . .  .Jt.  The  set  of  inequalities 
Ax  <  b  can  be  written  as 

^  A‘x‘  <  b.  (6.9) 

t 

The  multicommodity  flow  problem  can  be  formulated  in  this  way.  To  give  an  concrete 
example  of  this  formulation,  we  explain  the  correspondence.  With  respect  to  the  polytope 
P,  the  multicommodity  flow  variables  are  f{vw),  the  total  amount  of  flow  on  edge  vw.  We 
can  decompose  P  =  P*  x  •  •  •  x  P*  with  P*  representing  the  polytope  of  all  feasible  flows  for 
commodity  i.  As  we  have  seen,  each  flow  f{vw)  can  be  decomposed  into  flows  of  the  individual 
commodities  on  each  edge.  Therefore,  we  have  a  series  of  vectors  where  each  vector 

has  m  components,  one  for  each  edge,  representing  fi{vw).  The  constraints  (6.1)  through  (6.3) 
are  already  written  individually  for  each  commodity,  and  the  capacity  constraints,  which  sum 
the  total  flow  of  all  commodities  on  each  edge  are  of  the  form  given  in  equation  (6.9).  The 
optimization  routine  is  the  same  for  each  t,  namely  a  minimum-cost  flow  routine  for  commodity 
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For  the  multicommodity  flow  problem,  each  component  of  the  vector  x  £  P,  say  Xe,  is 
partitioned  into  k  parts,  through  ij,  such  that  for  each  i'  €  P*  and  for  each  e,  x*  = 

This  particular  partition  of  x  is  not  unique.  Assume  that  x  has  r  •  s  components.  It  is  possible 
to  partition  x  into  r  length-s  vectors  such  that  x*  =  (x‘i, . .  .x',)  =  (xi.(,_i)+i, .  We  use 

the  latter  partitioning  on  the  job  scheduling  problem. 

Other  deflnitions  for  the  product  of  polytope  representation  follow.  In  particular,  we  use 
a'j  to  denote  the  j*’*'  row  of  A'.  Let  p'  denote  the  width  of  P*  relative  to  A'x*  <  6,  i  =  1, . ,  .fc. 
Observe  that  p  =  p'-  Instead  of  a  subroutine  OPT,  we  have  a  series  of  k  subroutines.  The 

i**'  subroutine  minimizes  ex’  subject  to  x*  €  P*  for  costs  c  =  y‘A*. 

In  [48],  the  main  motivation  for  introducing  this  formulation  is  to  allow  the  use  of  random¬ 
ness.  Where  algorithm  Concurrent  randomly  chooses  a  commodity  i  to  reroute,  Plotkin, 
Shmoys  and  Tardos  randomly  choose  a  polytope  P*  over  which  to  optimize.  For  the  purposes 
of  shop  scheduling,  we  are  concerned  only  with  a  deterministic  algorithm,  and  so  the  main 
reason  for  introducing  this  formulation  is  to  simplify  our  algorithm.  By  separating  out  the 
different  polytopes,  an  iteration  of  Improve-Packing  can  be  applied  to  one  job  at  a  time. 

The  other  extension  we  need  to  make  deals  with  integral  solutions.  In  Problem  6.1.1,  we  need 
to  choose  integral  delays  for  each  job.  The  procedure  Improve-Packing  makes  no  attempt 
to  maintain  an  integral  solution.  However,  Improve- Packing  can  be  modified  to  obtain  an 
integral  solution.  We  proceed  to  outline  this  modification,  which  is  similar  to  the  one  used  in 
Section  3.2. 

First,  we  need  to  maintain  that  the  point  x  returned  by  the  optimization  routine  is  integral. 
We  make  use  of  the  fact  that  the  optimization  routines  of  interest  have  the  property  that  there 
always  exists  an  optimal  integral  solution.  Hence,  without  loss  of  generality,  we  can  restrict  the 
search  to  integral  solutions.  For  multicommodity  flow,  the  optimization  routine  has  integral 
solutions,  since  it  is  well-known  that  a  minimum-cost  flow  problem  with  integral  data  always 
has  an  optimal  integral  solution.  As  we  shall  see,  for  shop  scheduling,  it  is  also  true  that 
the  optimization  routine  always  has  an  optimal  integral  solution.  Second,  we  have  to  ensure 
that  (1  —  <r)x  -I-  ax  is  integral,  which  is  accomplished  by  maintaining  that  all  components  ij 
are  integral  multiples  of  the  current  value  of  a.  The  modifications  needed  in  the  analysis  are 
similar  to  those  needed  in  Section  3.2.  In  particular,  it  is  stiU  possible  to  show  that  even  with 
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the  restriction  to  integral  solutions,  a  potential  function  $  =  decreases  in  each  iteration, 
and  the  running  time  remains  that  same  as  that  of  the  non-integral  version,  up  to  constant 
factors.  In  addition,  by  analysis  similar  to  that  used  in  Theorem  3.2.1,  we  can  obtain  the 
following  theorem: 

Theorem  6.2.1  (Plotkin,  Shmoys,  Tardos[48])  Let  p  =  max<  p\  and  let  ^  =  max  {A*,  (p/d)  log  Af }, 
where  M  is  the  number  of  packing  constraints  and  d  is  a  parameter  such  that  each  compo¬ 
nent  of  each  2*  is  comprised  of  integral  multiples  of  d.  There  exists  an  integral  solution  to 
53, -4' at'  <  A6  with  x'  6  P*  for  i  =  1,. .  .,Ar  and  A  <  A*  -|-  0(\J\(pld) \og(Mkd)).  Repeated 
calls  to  the  deterministic  integral  version  of  Improve-Packing  find  such  a  solution  (x.  A)  using 
0(dp/p  +  plog(Af)/A  -I-  k\og(dp/p))  calls  to  each  of  the  k  subroutines.  Further,  throughout  the 
execution,  e  =  Cl{^^/^og(Mkd)/(dX’)). 

For  the  remainder  of  this  chapter,  we  focus  specifically  on  the  appbcation  to  shop  scheduling 
and  quote  results  from  [48]  as  needed. 

6.3  The  Solution 

We  now  turn  to  the  solution  of  Problem  6.1.1.  Since  we  introduce  initial  delays  in  the  range 
[0,  Ilmsx/  log(m/i)],  the  resulting  schedule  has  length  £  =  Pmax+nmu/logC^A*)-  We  can  represent 
the  processing  of  a  job  Jj  with  a  particular  initial  delay  d  by  an  (t  •  m)-length  {0,l}-vector 
where  each  posit'on  corresponds  to  a  machine  at  a  particular  time.  The  position  corresponding 
to  machine  m,  and  time  /  is  1  if  m,  is  processing  job  Jj  at  time  t,  and  0  otherwise.  For  each 
job  Jj  and  each  possible  delay  d,  there  is  a  vector  that  corresponds  to  assigning  delay  d  to 
Jj. 

Let  iTj  be  the  set  of  vectors  {V;i, . . . ,  where  dmax  =  nmM/log( m/i),  and  let  be 

the  i*^  component  of  Vjt.  Given  the  set  A  =  {xi, . . .,  Xr»}  of  sets  of  vectors,  Problem  6.1.1  can 
be  stated  as  the  problem  of  choosing  one  vector  from  each  Vj  (denoted  V^*),  such  that 

t^; 

>  =  1 


oo 


=  O(log(m/i)). 
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In  words,  we  wish  to  ensure  that  at  any  time  on  any  machine,  the  number  of  jobs  using  that 
machine  is  O(log(m/i)). 

We  can  reformulate  this  problem  as  a  {0,  l}-integer  program.  Let  be  the  indicator 
variable  used  to  indicate  whetb^'r  is  selected  from  Vj.  The  vector  x  has  length  n-dmax  where 
the  I**’  entry  is  denoted  problem  of  assigning  delays  to  the  jobs  so 

as  to  minimize  congestion  can  be  phrased  as  the  following  integer  program  (ICONJ): 

minimize  A 
subject  to 


^m*s 


11 

1, 

j  =  l,...,n; 

(6.10) 

Y,ZVj,ii)x^,  < 

A  log(  m/i). 

1  ITt^ 

(6.11) 

j  =  l  d=l 

X  € 

(6.12) 

Note  that  we  put  iog(m/x)  on  the  righthand  side  because  we  know,  by  Lemma  5.2.1  that 
there  exists  a  solution  where  the  maximum  number  of  jobs  on  any  machine  is  O(log(m/i)). 
Thus  A*  =  0(1).  However,  A*  can  be  as  small  as  1/ log( m/i).  While  our  algorithm  may  find 
such  a  solution,  the  best  that  we  can  show  that  it  always  finds  a  solution  with  A  =  0(1). 

The  linear  programming  relaxation  of  ICONJ  is  just  equations  (6.10),  (6.11)  and  the  con¬ 
straints  that  I  >  0.  Note  that  the  constraints  i  <  1  would  be  redundant,  given  (6.10).  In  order 
to  show  that  the  linear  programming  relaxation  of  ICONJ  is  a  packing  linear  program  we  will 
show  how  to  express  it  in  the  form  given  in  the  definition  of  a  packing  linear  program. 

Constraint  (6.11)  is  clearly  in  the  form  Ax  <  Ah  where  h  is  a  vector  in  which  each  component 
is  equal  to  log(m/i)  and  each  column  of  A  correspond  to  one  vector  Next  we  consider  the 
constraints  (6.10).  First,  we  see  that  these  can  be  decomposed  into  n  different  constraints, 
one  for  each  job  Jj.  Thus  the  polytope  P,  defining  all  constraints  (6.10)  can  be  decomposed 
into  n  polytopes  P  =  x  •  •  •  x  P" ,  where  polytope  P^  corresponds  to  job  Jj.  Now  we  focus 
on  a  particular  polytope  P^' .  The  constraint  -  1  just  says  that  for  all  vectors  x^' , 

x^  €  P^  must  lie  in  the  dmax-dimensional  unit  simplex.  Each  vertex  of  simplex  P^  corresponds 
to  choosing  a  particular  delay  for  job  j'. 
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With  the  problem  phrased  in  these  terms,  we  can  now  use  Theorem  6.2.1  to  bound  the 
number  of  iterations  of  an  integral  version  of  Improve-Packing  needs  to  find  a  solution  to 
ICONJ  with  A  =  0(1).  We  will  deal  with  the  implementation  of  an  iteration  later.  Note 
that  this  is  a  typical  use  of  the  framework  of  Plotkin,  Shmoys  and  Tardos;  one  bounds  the 
number  of  iterations  in  a  fairly  standard  manner  but  then  must  implement  one  iteration  in  a 
problem-specific  fashion. 

Lemma  6.3.1  An  integral  solution  to  ICONJ  can  be  found  such  that  A  =  0(1),  using  0(nlog(m/i)) 
iterations  of  a  deterministic  integral  version  of  Improve-Packing.  Further  throughout  the  execu¬ 
tion,  f  =  0(1). 


Proof:  We  must  compute  the  particular  values  of  the  various  parameters  in  Theorem  6.2.1. 
The  width 


p  =  maxo’  =  maxmaxmax(afi/W). 

j  }  i  x(;Pr  '  ' 


From  constraints  (6.10),  afxf  <  1  for  all  t,j  and  =  log(m/u)  for  all  i,j,  and  therefore 
p  =  l/log(m//).  The  polytope  P  is  the  crossproduct  of  n  identical  polytopes,  and  hence 
p  =  np.  The  number  polytopes  k  =  n  and  d  =  1,  since  all  variables  are  integral.  Next,  we 
bound  the  number  of  constraints.  There  is  one  constraint  for  every  possible  time  unit  for  every 
possible  job.  There  are  m  jobs  and  +  nn,ax/log(m/i)  possible  time  units.  Hence 


M  =  m{P^^  +  n„,„/log(m/i)).  (6.13) 

From  Lemma  5.3.1  we  know  that  the  maximum  operation  size  Pmax  =  0{np)  and  from  Lemma 
5.3.2,  the  number  of  jobs  n  =  0(m^/i®).  The  maximum  machine  load,  Hmax*  is  maximized  if 
all  operations  are  of  lengtn  p^ax  and  fall  on  one  particular  machine,  thus  Hmax  <  n/zp^ax  = 
0{v?p.'^).  The  maximum  job  length,  Pmax»  is  at  most  ppmax»  so  P^ax  =  0{np^).  Putting  these 
bounds  together,  we  obtain 

M  =  m(Pmax  -I-  nmax/log(mp))  =  O  ^Ttl 
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We  will  use  below  that  logM  =  O(log(m/i)).  Now  we  turn  to  the  quality  of  the  solution. 

V  =  max{A-,(?/<l)logM)  =  m»x(0(l),  =  0(1). 

In  addition,  since  we  know  by  Lemma  5.2.1  that  there  exists  a  solution  with  A  =  0(1), 
A*  =  0(1).  Thus 

A  <  A*  +  0{^\\pld)\og{Mkd)) 

=  0(1)  +  0  log(nV’mV)) 

=  0(1). 


For  the  running  time  we  need 


0{dplp  +  plog(M)/A  +  k\og{dp/p)) 


0(n  + 


nlog  Af 


log(m^) 
O(nlogn) 


+  nlogn) 


0(nlog(m/i)) 


calls. 

Finally,  the  error  f  =  Q{^\og(Mkd)/dX*)  =  n(l).  ■ 

We  now  turn  to  the  time  needed  to  implement  one  iteration.  We  will  show  that  an  iteration 
can  be  implemented  efficiently  in  the  RAM  model  of  computation.  While  it  appears  that  the 
algorithms  in  [48]  can  be  implemented  in  the  RAM  model  using  techniques  similar  to  those  used 
in  Section  2.4.2  of  this  thesis,  the  computation  is  not  explicitly  done  in  [48].  Here  we  perform  the 
necessary  computations  for  the  case  of  program  ICONJ.  The  first  step  of  Improve-Packing 
computes  the  costs  As  before,  the  difficulty  here  is  that  we  have  to  compute 

exponential  functions.  In  order  to  have  an  efficient  algorithm,  we  must  bound  the  precision 
needed  in  our  computation. 

We  now  show  that  we  need  only  0(log(m/i))  bits  of  precision  for  each  component  of  y.  We 
use  the  following  theorem  of  [48],  which  is  similar  to  Lemma  2.4.24. 
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Theorem  6.3.2  [48]  Let  Cp{y)  be  the  value  returned  by  OPT,  the  minimization  subroutine.  If 
we  replace  OPT  by  an  algorithm  that  finds  a  point  x  such  that 

y'^Ax  <  (1  +  |)Cp(y)  +  (6.14) 

for  any  y  >  0,  then  all  the  bounds  on  the  running  time  of  the  algorithm  still  hold.  In  particular,  a 
potential  function  $  =  y^6  still  decreases  by  the  same  amount,  up  to  constant  factors. 

While  this  theorem  stipulates  that  an  approximate  x  is  sufficient,  it  does  not  explain  how 
to  find  one  efficiently.  We  now  show  that  such  an  x  can  be  found.  The  idea  is  to  compute  an 
approximate  set  of  dual  variables  y,  such  that  exact  optimization  with  respect  to  y  produces 
an  X  satisfying  inequality  (6.14). 

The  approximate  dual  variables  y  are  integral  and  consist  of  O(log(m/i))  bits  per  edge.  We 
introduce  two  parameters  7,  an  amount  by  which  each  approximate  dual  variable  is  scaled,  and 
C,  the  number  of  bits  of  accuracy  in  each  approximate  dual  variable.  We  set  7  =  f'e“*/(n/ipm„), 
and  will  compute  approximate  dual  variables  y  so  that  7y  <  y  and  each  component  of  y  can  be 
represented  in  C  bits.  It  will  take  0(log(mp))  time  to  compute  one  component  of  y. 

For  each  component  y,,  first  we  compute  approximately  so  that  it  has  at  most 

C  =  f/(4ppn,ax)  additive  error,  then  we  multiply  the  result  by  take  the  integer  part,  and  set 
yi  to  be  this  value.  Using  the  Taylor  series,  we  can  compute  one  bit  of  e*  in  0(1)  time.  Since 
go(o,x/»,-A)  jg  jjjQgj  j  Qjj  edge,  it  is  sufficient  to  compute  O(log(l/C))  bits  to  achieve 
the  desired  approximation.  Therefore  each  y^  can  be  computed  in  O(log(l/C))  =  0(log(mp/€)) 
time.  From  Theorem  6.2.1,  we  know  that  e  =  0(1),  hence  0(log(m/i/f))  is  just  O(log(m/i)). 

Because  of  the  approximation  and  the  integer  rounding,  the  vector  x,  which  is  of  minimum 
cost  with  respect  to  y,  is  not  necessarily  minimum-cost  with  respect  to  y.  However,  we  now 
show  that  an  x  that  is  minimum-cost  with  respect  to  y  satisfies  conditions  (6.14). 

Lemma  6.3.3  Let  y  be  a  set  of  dual  variables  computed  as  above.  Then  a  point  x  6 
minimizing  y^ A^x  satisfies  inequality  (6.14),  where  Cp(y)  is  computed  with  respect  to  exact  dual 
variables  y.  Further,  each  yi  can  be  represented  in  O(log(mp))  bits. 

Proof :  The  idea  is  to  show  that  if  we  minimize  with  respect  to  the  approximate  dual  variables 
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y,  the  resulting  solution,  z,  is  “close”  to  the  true  minimum  solution  x.  We  will  first  bound  the 
maximum  possible  difference  between  any  component  of  y  and  the  corresponding  component  of 
y.  We  will  then  translate  this  into  a  bound  on  the  difference  between  ?  =  jf^A^  and  =  y'^AK 
Finally,  we  will  show  that  the  both  z  and  z  have  a  very  special  structure:  they  have  exactly 
one  component  equal  to  1  and  the  rest  equal  to  0.  This  will  allow  us  to  bound  the  difference 
between  y"^ Ax  and  Cp(y)  and  hence  prove  the  lemma. 

We  now  proceed  with  the  details.  Recall  that  7  =  c“*C/l‘>g(m^)  and  (  =  e/(4/ipmpx)-  We 
bound  the  difference  between  y  and  7y,  the  approximate  dual  variables  scaled  to  have  the  same 
units  as  y.  In  computing  7y,  we  introduce  errors  in  several  places.  When  computing 
to  a  precision  of  C,  we  introduce  an  error  of  at  most  C  This  error  is  multiplied  by  by  and 
may  be  increased  by  1  when  we  round  y  down  to  an  integer.  Finally,  when  we  scale  y  to  have 
the  same  units  as  y,  the  entire  error  gets  scaled  by  7.  Thus, 

Vi-lVi  <  7(C(r0  +  l) 

=  27.  (6.15) 

Now  consider  ?  =  y^  A^ .  Each  column  of  A^  corresponds  to  a  vector  for  some  d  and 
hence  is  a  vector  of  length  t  •  m  that  has  at  most  nPm*x  ones.  Thus  each  entry  of  &  is  the 
sum  of  at  most  ppmax  entries  of  y.  By  inequality  (6.15),  we  know  that  -yy,  differs  from  y*  by  at 
most  27.  Hence  the  difference  between  an  entry  of  'yff,  say  7cj,  and  the  corresponding  entry  of 
=  y^A^  is  at  most  27/ipmax-  Consider  the  problem  of  finding  an  z  €  that  minimizes  Px. 
(The  case  for  &x  is  identical.)  The  vector  is  non-negative.  Let  cj  be  the  component  of  ? 
with  minimum  value.  Recall  that  constraints  (6.10)  require  that  YfiVi  =  1-  Then  it  is  easy 
to  see  that  the  z^  G  P^  that  minimizes  Px^  has  z^  =  1  and  all  other  components  of  z^  equal  to 
0.  This  setting  is  the  vector  z.  Thus  for  the  z  that  minimizes  c^z^,  we  have  that 

&x  —  ')Px  =  (c^  — 7c^)z 
= 

<  27/iPmpx,  (6-16) 


6.3.  THE  SOLUTION 


163 


where  the  last  inequality  follows  from  the  discussion  above.  But 


27MPin.« 


2c*‘*c/ipm>x  _  c  e°* 
4/iPm..log(m/i)  21og(m/i)’ 


(6.17) 


We  know  that  <  y^b,  since  at  least  one  component  of  y  is  equal  to  all  components 

are  non-negative  and  bi  =  log(m/i)  Vt.  Also,  since  some  job  must  execute  on  some  machine, 
A  >  .  Combining  inequalities  (6.16)  and  (6.17)  with  these  two  bounds  we  get  that 


c*x  -  't&x  <  ^^y*b.  (6.18) 

But  fc^x  just  selects  the  minimum  element  of  &.  Since  componentwise,  7?  <  o',  we  know  that 
7c'i  <  Cp{y),  which  together  with  inequality  (6.18)  impbes  inequality  (6.14).  ■ 

So,  without  affecting  the  performance  of  our  algorithm,  we  can  use  approximate  costs.  Note 
that  we  have  also  shown  that  the  subroutine  OPT  appUed  to  this  problem  always  has  an  optimal 
solution  that  is  integral.  Further,  we  have  a  nice  combinatorial  characterization  of  this  routine, 
since  it  reduces  to  finding  the  minimum  of  a  set  of  numbers.  Yet,  in  order  to  have  an  efficient 
algorithm,  more  work  needs  to  be  done.  Since  the  number  of  entries  in  the  matrix  A  and 
the  vectors  y  and  6  are  extremely  large  polynomials  in  m  and  ft,  we  would  Uke  to  avoid  using 
straightforward  matrix- vector  and  vector-vector  multiplications,  since  they  take  too  much  time. 
We  can  obtain  a  more  efficient  algorithm  by  taking  advantage  of  the  structure  of  the  problem 
and  noticing  that  between  two  iterations,  not  too  many  variables  change. 

In  the  remainder  of  this  chapter,  we  will  first  show  that  in  each  iteration  of  Improve- 
Packing,  a  small  number  of  the  components  of  y  change.  We  will  then  show  that  if  a  small 
number  of  the  components  of  y  change  then  a  small  number  of  the  components  of  c  change. 
Further,  we  will  show  how  to  compute  an  entry  of  c  in  less  time  than  the  naive  method  of 
multiplying  by  a  column  in  A.  We  will  then  show,  that  by  using  a  heap  data  structure,  we 
can  efficiently  compute  min{y'^ A^ x^  :  x^  €  P^)  and  thus  be  able  to  conclude  that  an  iteration 
of  Improve-Packing  can  be  implemented  efficiently. 


Lemma  6.3.4  In  each  iteration  of  Improve-Packing,  0(n/i)  components  of  y  change.  Further, 
these  changes  can  be  computed  in  O(n//log(m/x))  time. 
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Proof:  Consider  the  evaluation  of  the  statement  x'  {1  —  a)x^  ,  Since  we  maintain 
that  (7=1  throughout,  this  reduces  to  x^  *-  x^ .  In  other  words,  we  take  job  Jj  and  change 
its  assignment  delay  from  some  value  d'  to  some  other  value  d",  set  =  0  and  x^»  =  1.  By 
changing  the  value  of  one  variable  x^,  we  affect  the  value  of  at  most  Pbm  dual  variables,  because 
setting  x^  =  1  implies  that  job  j  runs  for  at  most  Pmm*  time  units  starting  at  time  d.  Since 
each  y,  corresponds  to  a  particular  machine  at  a  particular  time,  only  the  P„^  components  of 
y  that  correspond  to  machine-time  pairs  in  the  schedule  implied  by  x^  are  affected.  Thus  we 
must  recompute  at  most  2Pm%g  components  of  y. 

Once  we  know  which  elements  to  compute.  Lemma  6.3.3  tells  us  that  each  one  can  be 
computed  in  O(log(m/i))  time.  To  identify  the  components  of  y  to  recompute,  we  simulate 
the  execution  of  job  Jj  with  delay  d.  We  simply  walk  through  the  corresponding  schedule  for 
that  job  and  for  each  machine-time  pair  that  the  job  uses,  and  update  the  corresponding  dual 
variable. 

■ 

We  now  turn  to  the  second  and  third  steps  of  Improve-Packing.  Step  2  involves  finding 
a  minimum-cost  point  x^  €  P^  for  costs  d  =  y^A^  Vj  =  1,. .  .,n  and  step  3  involves  updating 
the  solution. 

We  first  show  that  in  an  iteration,  not  too  many  of  the  components  of  c  change.  As  we 
saw  in  Lemma  6.3.4,  the  only  components  of  y  that  change  are  those  associated  with  machines 
running  in  the  time  intervals  (d',d'  +  -  1)  and  (d'\d"  -I-  “  !)•  Each  component  of  c, 

c^,  is  associated  with  starting  job  Jj  at  time  d  then  running  for  up  to  units.  In  fact,  is 
equal  to  the  sum  of  the  components  of  y  associated  with  the  machine-time  pairs  on  which  job  Jj 
is  active.  Since  all  the  components  of  y  that  changed  are  in  the  two  intervals  {d',d'  -|-  Pn,,,  -  1) 
and  (d",  d"  -f  Pmax  -  1 ),  the  only  components  of  d  that  can  change  must  be  associated  with  jobs 
running  in  those  intervals.  But  the  only  jobs  can  be  running  in  those  intervals  are  those  that 
receive  initial  delays  in  the  range  (<f'- Pmax,d'-|- Pm„  -  1)  or  the  range  (d"-Pmax,d"-|-Pma, -1). 
There  is  one  component  of  d  associated  with  each  possible  delay,  and  hence  for  a  particular 
job  Jj,  there  are  a  total  of  at  most  4Pn,ax  changes  overall.  Summing  over  all  the  jobs,  we  get  a 
total  of  4nPm,x  possible  changes. 

We  could  recompute  these  4nPn,,x  components  of  c  by  taking  the  dot  product  of  two  length 
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<^inax  vectors.  However,  we  can  perform  this  computation  more  efficiently. 

Lemma  6.3.6  Assume  that  we  have  computed  the  new  values  of  the  components  of  y  as  in 
Lemma  6.3.4.  Then  the  correct  values  of  c  =  c* , . . . ,  c”  can  be  computed  in  0(n^/i’)  time. 

Proof :  Assume  that  we  have  already  computed  and  we  wish  to  compute  .  Recall  the 
definition  of  Vj,i.  It  has  a  component  for  every  time  on  every  machine  and  is  one  when  the 
component  associated  with  a  particular  machine-time  pair  is  busy.  We  can  express 

ci  =  =  y'^Vji 

and 

4+1  =  =  /^xd+D- 

Additionally  Vj(d+i)  corresponds  to  running  the  operations  of  job  Jj  on  the  same  sequence  of 
machines  as  only  one  time  unit  later.  Therefore,  the  two  vectors  and  differ  in 

at  most  2/1  positions.  Hence,  4+i  ^^.n  be  computed  from  using  0(/i)  additions,  assuming 
we  know  in  which  components  and  differ.  Since  the  matrix  A  is  fixed  throughout  the 
algorithm,  we  can  use  a  preprocessing  phase  to  construct  a  series  of  lists  one  for  each 
Cj,  j  =  1, . . . ,  n,  d  =  2, . . . ,  dmax-  Each  contains  a  list  of  the  up  to  2/i  positions  in  which 
Vji  differs  from  V}(,/_i),  along  with  an  indication  of  whether  the  difference  is  that  Vjj  =  1  and 
Vj(d_i)  =  0  or  vice  versa.  There  is  no  predecessor  of  cj,  but  we  can  precompute  X(  ji  =  1, . . . ,  n, 
the  list  of  positions  in  which  V^i  =  1,  to  speed  up  the  computation  of  c\. 

Thus  for  each  j,  we  may  have  to  compute  cj,  which  takes  0(/ipmax)  time,  and  then  4Pmax 
more  values,  each  of  which  take  0(/i)  time.  Summing  over  all  jobs,  we  get  a  total  time  of 
Oiniup 

max  +  pPm^x))  =  =  0(nV)-  ■ 

As  mentioned  before,  given  a  vector  o',  the  problem  of  finding  an  x  that  minimizes  dx  for  a 
job  Jj  consists  of  choosing  the  minimum  component  of  d .  Thus,  for  each  job  Jj,  we  maintain  a 
heap  consisting  of  the  dm»x  values  of  c^,d  =  1, . . .,  n.  We  also  maintain  a  list  O ,  where  the 
1“’  component  of  O ,  C{,  contains  a  pointer  to  the  position  that  cj  occupies  in  heap  H^.  This 
data  structure  allows  us  to  insert  an  element,  delete  an  arbitrary  element  and  find  the  minimum 
value,  all  in  0(logdm„)  =  0(log(mp))  time.  Thus  we  can  minimize  c^x  in  O(log(m/i))  time  by 
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1.  Let  z^.  and  be  the  variables  changed  in  the  previous  iteration. 

2.  Compute  as  described  in  Lemma  6.3.4. 

3.  Let  2?  =  (d'  -  Pmmx  t  d'  +  Pmmx  “  1)  U  (d"  -  d"  +  Pnmx  “  1)  the  Set  of  possible  initial 
delays  affected. 

4.  Recompute  7  =  1, . . . ,  n;  Vd  €  2>,  making  the  appropriate  heap  changes. 

5.  Compute  the  nainimum  value  in  each  heap.  Let  the  minimum  value  in  Hj  be  j  = 
l,...,n. 

6.  Compute  the  decrease  in  ^  =  y^b  associated  with  replacing  the  current  assignment  for 
job  Jj  with  a  minimum-cost  assignment,  j  =  1, . . .,  n. 

7.  Let  j'  be  the  job  that  maximizes  the  decrease  in  Let  x  be  the  vector  with  a  1  in 

position  ij>  and  0  elsewhere.  Let  ■*—  x. _ 

Figure  6.2:  One  iteration  of  the  algorithm 

choosing  the  minimum  element  out  of  the  heap. 

Of  course,  we  must  update  the  heap  when  costs  change.  Each  change  can  be  implemented  as 
a  delete  followed  by  an  insert.  However,  as  we  saw  in  Lemma  6.3.5,  there  are  changes 

in  the  costs  for  each  job  for  a  total  of  0{nPmmx)  heap  operations. 

We  can  now  restate  the  algorithm  for  one  iteration  of  Improve-Packing,  specialized  to 
Problem  6.1.1.  This  algorithm  appears  in  Figure  6.2  and  is  a  summary  and  formalization  of  ^he 
ideas  presented  in  this  section.  Note  that,  in  spite  of  the  phrasing  of  the  problem  as  a  linear 
program,  at  no  point  do  we  have  to  perform  any  matrix  operations. 

Lemma  6.3.6  The  algorithm  in  Figure  6.2  executes  an  iteration  of  Improve-Packing  in 
Oin^fi^ifi  -f-  log(m/i)))  time. 

Proof :  The  correctness  follows  from  Lemma  6.3.3  and  the  discussion  of  this  section.  We  pro¬ 
ceed  to  bound  the  running  time.  By  Lemma  6.3.4,  Step  2  takes  O{n^i\og{m^i))  time.  By 
Lemma  6.3.5,  updating  c  takes  0{n^fi^)  time  and  requires  O(nPmax)  heap  operations  that 
take  O(n*/i*log(m^))  time.  Step  5  requires  one  operation  in  each  of  n  heaps,  for  a  total  of 
O(nlog(m/x))  time.  For  Step  6,  we  need,  for  each  job,  to  compute  the  new  $  that  would  occur 
after  reassigning  that  job.  Since  $  =  y^6  =  (y^l)log(m/i),  $  can  be  computed  by  summing 
all  the  components  of  y.  To  compute  the  change  associated  with  one  job  ,  we  simulate  Steps 
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1  and  2  of  the  algorithm  and  then  compare  the  sum  of  the  values  of  the  0{P„.,)  components 
of  y  that  have  changed.  We  can  perform  this  computation  in  0{nfi\og{mfi))  time  per  job 
and  O(n*/ilog(m/i))  time  overall.  Step  7  can  be  performed  in  0(n)  time.  Putting  these  steps 
together  yields  the  bound  of  the  lemma.  ■ 

Combining  this  lemma  with  Lemma  6.3.1  yields  the  following  theorem: 

Theorem  6.S.7  Problem  6.1.1  can  be  solved  deterministically  in  O(n^/i^log(m/i)(^+log(m^))) 
time. 

Finally,  combining  this  theorem  with  the  resiilts  of  Chapter  5  we  get  the  following  theorem. 
The  other  bottleneck  in  the  shop  scheduling  algorithm  is  the  algorithm  of  Sevast’yanov  [56] 
that  tadces  0((/imn)^)  time. 

Theorem  6.3.8  There  exists  a  deterministic  algorithm  for  job  shop  scheduling  that  finds  a  sched¬ 
ule  of  length  0(log^(m/i)CjJ,„)  in  -f  n®/i^log(m|i)(/i  -f  log( m/i)))  time. 

The  running  time  in  Theorem  6.3.8  compares  quite  favorably  with  the  previous  best  bound 
for  this  problem.  The  previous  bounds  involved  using  the  linear  programming  algorithm  of 
Vaidya  [67],  which  takes  ®/i^log(m/i))  for  this  problem,  combined  with  a  deterministic 

version  of  the  randomized  rounding  of  Raghavan  and  Thompson  [52]  and  Raghavan  [50]. 
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A  glossary  of  nation,  containing  symbols  that  are  used  frequently,  follows.  It  is  divided  into  three 
sections.  The  first  contains  notation  used  in  Chapters  2,  3  and  4  to  describe  multicommodity 
flows.  The  second  contains  notation  used  in  Chapters  5  and  6  to  describe  shop  scheduling.  The 
third  contains  notation  used  only  in  Chapter  6.  Each  section  is  alphabetized,  with  all  Roman 
letters  preceding  all  Greek  letters. 

Chapters  2,  3  and  4 

Symbol  Meaning 


Ci 

c;,c;(A) 

c{vw) 

D 

Di 

di 

diivj) 
d{A,A) 
dist/(v,  w) 

dm*x 

T 

I: 

I: 


the  cost  of  the  flow  for  commodity  i 

the  cost  of  the  minimum-cost  flow  for  commodity  t 

the  cost  of  edge  vw 

the  sum  of  demands 

max^  ||d,(r)|} 

the  demand  for  commodity  t 

the  demand  for  commodity  t  at  node  Vj 

the  demand  for  commodities  with  one  endpoint  in  A  and  the  other  in  A 

the  length  of  the  shortest  path  from  v  to  w  with  respect  to  / 

the  maximum  demand 

an  instance  of  a  feasible  flow  problem 

an  approximately  minimum-cost  flow  for  commodity  i 

an  approximately  minimum-cost  flow  for  commodity  i  that  can  be 

represented  in  O(log(nt/))  bits 
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f(vw) 

fiiP) 

Mvw) 
G=iV,E) 
Gf  =  {V,Ej) 
I 
Ip 

Iv 

1C 

k 

k* 

l{vw) 

M 

m 

M 

n 

P 

Vi 

p{v) 

Si 

Tmcf 

Tp 

Tv 

ti 

U 

u(ru;) 

z 

a 

T{A) 

7 


the  flow  on  edge  vw 

the  flow  of  commodity  t  on  path  P 

the  flow  of  commodity  t  on  edge  vw 

a  graph  with  vertex  set  V  and  edge  set  E 

the  residual  graph  with  vertex  set  V  and  edge  set  Ej 

an  instance  of  a  multicommodity  flow  problem 

the  number  of  productive  iterations  during  a  call  to  Decongest 

the  number  of  unproductive  iterations  during  a  call  to  Decongest 

a  speciflcation  of  k  commodities 

number  of  commodities 

the  number  of  distinct  sources 

the  length  of  edge  vw 

an  instance  of  a  minimum-cost  flow  problem 
number  of  edges  in  the  network 
an  instance  of  a  maximum  flow  problem 
number  of  nodes  in  the  network 
a  path 

a  collection  of  paths  from  s,-  to  t, 

the  price  of  node  v 

the  source  of  commodity  t 

the  time  to  find  a  minimum-cost  flow 

the  time  taken  by  one  productive  iteration  during  a  call  to  Decongest 

the  time  taken  by  one  unproductive  iteration  during  a  call  to  Decongest 

the  sink  of  commodity  i 

maximum  capacity 

the  capacity  of  edge  vw 

a  percentage  to  scale  demands  by 

a  parameter  in  the  exponent  of  the  value  of  t 

the  set  of  edges  with  exactly  one  endpoint  in  node  set  A,  the  cut  associated  with  A. 
an  amount  by  which  each  approximate  length  is  scaled 
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€  an  error  parameter;  a  measure  of  solution  accuracy 

A  the  maximum  edge  congestion;  congestion 

A*  the  minimum  possible  value  of  A 

Aq  the  congestion  at  the  start  of  procedure  Decongest 

$  the  potential  function 

the  value  of  the  potential  function  at  the  beginning  of  iteration  t  of  Decongest 
<7  the  fraction  of  flow  to  reroute 

(7,  the  fraction  of  flow  of  commodity  t  to  reroute 

r  a  target  value  of  A 

C  the  number  of  bits  of  accuracy  in  each  approximate  length 


Chapters  5  and  6 


Symbol 


Meaning 


Co 

Cin«x 

J  —  {Jh’  •  ’^Jn] 

M  =  {mi, m2,..., Trim} 

m 

n 

O  =  {Oij\i  =  =  l,...,n} 

■Pmax 

Pi: 

P'i: 

»*; 

Si 

( 

Kij 


the  completion  time  of  operation  0,^ 

maxij  Cij 

the  minimum  possible  completion  time 

the  set  of  jobs 

the  set  of  machines 

the  number  of  machines 

the  number  of  jobs 

the  set  of  operations 

the  maximum  job  length 

the  processing  time  of  the  t**'  operation  of  job  Jj 
Pij  rounded  up  to  the  next  power  of  2 
the  release  date  of  job  Jj 
a  set  of  identical  machines 
an  error  parameter 

the  machine  on  which  operation  Oij  runs 
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the  maximum  number  of  operations  in  any  job 
yLj  the  number  of  operations  of  job  Jj 

u  the  total  number  of  operations 

n,„ax  maximum  machine  load 


Chapter  6 


Symbol  Meaning 


A 

Oi 

bi 

Criy) 

c 

c> 

d 

d',d" 

Hi 

a 

M 

p 

Vi, 

xi 

X 

x' 

y 

7 

A 

$ 

P 


a,  px  q  non- negative  matrix 
the  row  of  A 
t‘*’  entry  in  b 

a  minimum  cost  point  subject  to  costs  y 
costs,  c  =  y'^  A 

approximate  costs,  ?  =  y'^A^ 

a  value  that  each  component  of  z'  is  an  integral  multiple  of 
delay  values 

a  heap  associated  with  job  Jj 

pointers  to  values  in  H^ 

the  number  of  packing  constraints 

a  polytope,  a  convex  set  in  R+ 

the  vector  associated  with  assigning  delay  d  to  job  Jj 

a  precomputed  list  to  speed  the  computation  of 

an  approximate  solution  to  OPT 

a  point  in  polytope  P* 

dual  variables 

an  amount  by  which  each  approximate  dual  variable  is  scaled 
maXi  aix/bi 

a  potential  function 

max,  max,£p  a,z/&j,  the  width  of  P  relative  to  ^z  <  i> 
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P' 

c 


the  width  of  P*  relative  to  A*x*  <  b 

the  number  of  bits  of  accuracy  in  each  approximate  dual  variable 
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